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(54) Method and device for predk:tion of the reliability of software programs 



(57) The present invention relates to a method and 
device for prediction of the reliability of software pro- 
grams, in particular of the number of failures occuning 
during test and the number of faults remaining in a soft- 
ware program after test. By collecting data about the 
testing effort and the number of failures provoked during 
the test as well as software maturity data, predictions 
on the future number of failures to be found during fur- 
ther testing or even predictions on the number of faults 
remaining in the software after testing are possible using 
a statistical model. 
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Description 

[0001 ] The present Invention relates to a method and a con^esponding device for failure prediction of software pro- 
grams, In particular of the number of failures occuning during test and the number of faults remaining In a software 
program after test, a computer program for Implementing said method on a computer and to a data carrier carrying 
such a computer program. 

[0002] In many software development processes systematic software tests are the last instance of quality assurance 
before delivering the software to the end user. Thus, during the test phase it Is Important to determine when the quality 
of the sofhvare under test will allow delivery. In detail, the number of software faults that remain In the software at time 
of delivery has a high Influence on the total quality of the software. Unfortunately, it is not possible to produce software 
that Is generally free of faults or even exactly determine the number of faults In a certain software. 
[0003] However, testing as a typical method of random sampling can lead to statistical estimations about the quality 
of the software. By collecting data about the testing effort and the number of failures provoked during the test, predictions 
of the future number of failures to be found during further testing or even of the number of faults remaining in the 
IS software after testing should be possible. 

[0004] Currently several predictton models are published which follow the same basic principle. During test execution 
data on the test progress and data on the failure occurrences, i.e. number of failures during a given test progress 
Interval, are collected. For some models also source and severity of the failures is documented. T?ie resulting data 
sets are then extrapolated using a statistical model to predict how many Allures will occur when the testing Is continued 
and to estimate how many faults remain In the software after test. 

[0005] Although the models differ In detail, a number of general characteristics can be identified: 

• All n^odels measure the test progress either 

■ using the real (calendar) time of the testing interval, 

■ using the net amount of time spent for testing (i.e, the time when test cases were actually executed, summa- 
rised over all test locations and test persons) or 

■ the coverage of the sofhware under test achieved by the test cases, e.g. the number of code elements activated 
by the tests related to the total number of code elements In the software (so-called "white-box coverage" or 

30 "code coverage". 

• The models assume specific test strategies: 

■ For operational testing, Le. executing the particular functions of the software In the relative frequencies that 
35 are derived from the expected usage profiles after release. 

■ For systematic testing, I.e. for structured ttcecution of all functions with minimal testing effort. The goal Is to 
reduce redundancy of the test execution, l e. basically to reduce the amount of code that is activated more 
than once during testing. 

[0006] The vast majority of nuxlels have been derived for operational testing, while neariy no models exist for sys- 
tematic testing. 

[0007] For details on existing models reference is made to Grottke, M.: Software Reliability Model Study, Deliverable 
A.2. Project PETS, 1ST 1999-55017, June 2001 (pp. 6-20) and Grottke, M., Sohniein. D.: Justified Model Selection 
Deliverable A.4, Project PETS, 1ST 1 99^5601 7, October 2001 (pp. 5-21 ). 

[0008] All existing models strongly depend on large data sets, Le. the quality of the predfellons Is related to the 
amount of Input data. As a consequence, these models are only usable for large-scale software systems with long- 
lasting development projects. Each model contains speclfk; assumptions on the software development and testing 
processes, on the stmcture of collected data and on the testing strategy used. They will in consequence produce 
inadequate predfctfons if those assumptions are not fully matched by the real testing activities. Thus, models whfch 
base upon operational testing are not suitable for test projects which employ systematic testing. 
B)009] Additbnally, ttiere Is few or no standard software known on the maricet that Implements some or all of these 
models and ttiat is designed to be adapted to existing test management or test automation software. Thus, in the most 
cases statlstfcs software is used for prediction purposes, whteh requires an educated statistician to reformat, import, 
analyse and process the data delivered from the testing project. 

[0010] The object of ttie present invention is to develop and to implement processes and software for the predfctlon 
of software failures during test and for the predfction of faults remaining In ttie software after test ttiat are usable for 
small and medium enterprises (SMEs). SMEs would strongly profit from such predictions and the resulting potentials 
of improving titeir software development processes, e.g. minimizing the testing time. 
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[001 1 1 This object is achieved according to the present Invention by a method as claimed in claim 1 and a device as 
claimed in claim 20. The Invention also relates to a computer program as claimed in claim 21 and to a data carrier as 
claimed in dalm 22. , 

[0012] According to the invention a method is proposed comprising the steps of: 

- collecting the test data and the number of failures occuning during testing and/or collecting the software maturity 
data and project-specific data from developers or testers of the software, 

- processing the collected data by a data analysis, a parameter estimation, a data compression and/or a prediction 
to obtain failure prediction data, and 

10 . outputting said failure prediction data. 

[0013] The invention is based on two main ideas: 



■ Firstly, test data from systematic testing is used to predict the number of failures occurring during testing and the 

19 number of faults remaining in the software after test. To model different levels of redundancy during the test process 
a set of statistical models have been developed to explicitly consider and account for different redundancy models. 
The base partial redundancy model, as well as the first and second partial redundancy model are the first models 
which specifically account for ttie characteristics of systematic testing and incorporate different redundancy levels. 

20 m Secondly, predictions on the number of failure occunrences during test, as well as the number of faults remaining 

in the software after test can also be made before testing has started solely on data characterizing ttie software 
process maturity and on projett-specific data. The software maturity data is collected using a questionnaire which 
describes the quality of relevant processes following ttie SPICE model. 

^ ■ Using elaborate techniques ttie maturity data coming from the questionnaire has been analysed and ttie most 
influential factors related to ttie failure occurrence during test and ttie number of faults remaining in the software 
after test has been determined. 

[0014] ToenableSMEstousesoftwarereliabiiitypredictionsttiefoliowingrequirementshavebeenmetby 
30 invention: 



• Since most SMEs perfomri systematic testing, the new statistical model is designed for this testing strategy. Sys- 
tematic testing means to reduce ttie redundancy of tests. This strongly depends on ttie specific testing sidlls of the 
SME. Thus, ttie model is able to deal witti different testing redundancy levels. 

• Instead of using different statistical models, which forces the user to understand and to implement each of the 
models, a small set of models has been developed which are flexible enough to adapt to different software devel- 
opment and testing processes and to different development project characteristics. 

40 • The models produce high-quality, a priori predictions with few or even no test progress data, and is able to refine 
ttiese predictions step-by-step when more test progress data becomes available during ttie testing activities- ttius 
it is usable for small development pnsjeds. 

• Since the software development maturity level also varies between ttie SMEs, the model is also able to deal with 
^ different qualities of software development processes, i.e. witii different initial fault densities in ttie software. 

• The model can be encapsulated in a user-friendly standard software that should provide open interfaces to market- 
leading test management software; using ttiis software should be able for test managers and testers, not only for 
statisticians. 

so 

POIS] Finally, the invention is designed and implemented to 

• be available to all SMEs independent from ttieir technical platfomns and back-office structures, 
55 • have open interfaces to import and export testing progress and failure occurrence data. 

• implement a simple mettiod of gathering software process maturity data and project-specific data. 
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• condense these maturity data and integrate them into the new statisticai model, 

• implement the new modeis transparently so that a non-statistician can use them elflclently 

• generate and visualize predictions for software reliability 
[001 6] Preferred embodiments are defined in the dependent claims. 

[0017] A first prefen^ed embodiment and variations thereof are defined In claims 2, 7 and 9. In these embodiments 
the Importance of sofhware process maturity data as well as the process means of gathering this data Is stressed. The 
advantage of using a questionnaire which follows the SPICE standard is that the standard Is well-icnown and a certain 
Icnowledge can be expected. Following the SPICE standard ratings from official assessments can be used within the 
models. If no ratings from assessments are available each Interested party can conduct a self-assessment with the 
help of the questionnaire and the written scenarios. With the help of the detailed scenarios even unexperienced users 
can infer a maturity rating. 

[00181 Another prefenred embodiment is defined in claim 14. The questionnaire developed within this Invention has 
been further analysed using a univariate and/or multivariate and/or correlation analysis. Thus, the processes with the 
highest Influence on the failure rates have been determined. As a consequence the questionnaire will be further con- 
densed reducing the time necessary for the self-assessment of the software process maturity 
[0019] Further embodiments of the Invention are defined In claims 17, 18and 19. The redundancy of the test process 
depends on the sidll level of the tester. Using the zero inflated binomial model, as well as the maximum lilcelihood 
method for parameter estimation it was possible to develop a model that explicitly accounts for different redundancy 
models. Tlie FEPR and SEPR are tiie first models ¥»rhich explicitly consider this characteristic of systematic testing. 
[00201 Ott^er preferred embodimente are defined In claims 20, 21 and 22. The model and Hb theoretical derivation 
is of no practical useforSMEs as it is very hard for non statisticians to transfbnm the mathematical model Intoacomputer 
routines and validate their conrectness. Thus, the device, the computer program and the data camer are of utmost 
importance. A easy-to-use prototype Is absolutely necessary for the usage In industrial projects. 
[0021] The following results and embodiments were obtained by the present invention: 

• the soK»lled "basic", "first extended" and "second extended" "partial redundancy models" (see sections 2.3 and 
2.4 below), which are designed for systematic testing and can dynamically adapt to different levels of redundancy. 
These new models enable companies which apply systematic test to their software product to predict the number 
of failures occunrences during test and number of faults remaining in the software after test. 

• the first extended partial redundancy nrwdels" (FEPR) uses test progress data in form of test cases executed 
(black box coverage) and failures occurrence data as input. The FEPR allows predictions on the basis of relatively 
simple test progress data like "test cases executed" and failure occurrence data. 

• the -second extended partial redundancy modeT employs test progress data In form of code coverage, i.e. the 

number of code constructs executed by the tests, and failure occurrence data. The SEPR allows more precise 

predk:tions on the basis of test progress data which employs coverage information and faiiure occurrence data. 

Gathering test progress data ofthlsWndfemoreelaborate and timeK»nsuming. Howrever, it enables more precise 
predk:tions. 

• methods for the dynamk: and static estimation of the 

• number of failures to be occurred after i executed test cases; 

• number of faults remaining in the software after I executed test cases; 
so • number of faults remaining after executing x additional test cases; 

• number of additional test cases to be executed until x faults remain. 

The four different measures listed here provide a test manager or project manager with inforniation upon whfeh 
he can plan programming and testing resources and effort. Based on this information he can conduct a risk 
S5 analysis and can plan for failures to be occurring after release of the software. 

• dynamte estimates are based on test progress and failure occun^ence data using the FEPR and S EPR. The dynamic 
estimates can be used from the beginning of the test process. They can be recalculated and evaluated with every 
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new data set coming in. 

• static estimates, i.e. estimates before testing starts, are based on the software process maturity data and project 
specific data. The static estimates allow a prediction on the number failures to be expected before even testing 

5 starts. They are a first, early estimate of what to expect during the test process. 

• the so-called "PETS questionnaire" (see section 3.1 below) which gathers the relevant process maturity data by 
means of a multiple-choice questionnaire and thus replaces a time-consuming assessment. The advantage of the 
questionnaire is that all relevant data can be collected In a reasonable amount of time. Every question is charac- 

10 terized with a set of scenarios. Thus, the user does not have to be a trained assessor to fill out the questionnaire. 

• methods for selecting the most influential software process maturity information and for linidng them to the test 
progress and failure occunrence data for prediction purpose; 

Using different analysis techniques the most important processes have been identified. Thus, the questionnaire 
'5 will be further condensed again reducing the time and effort necessary to gather the maturity data. 

• a software, the so-called "PETS prototype", Implemented In Java which combines data import and export, statistical 
model and parameter estimation and questionnaire to a user-friendly standard software. One incentive of the 
incentive is to allow SMEs to use such predictions models wittiout having to employ an experienced statistician. 

20 Therefore, a sofhArare was needed that allows an easy application of tiiese wittiin a wide range of companies. 

[0022] The present invention will now be explained in more detail witti reference to tiie figures in which 
Fig. A shows a flow-chart of the mettiod according to the invention, and 

25 

Rg. B shows a block diagram illustrating the general Idea/WoricHow of tiie invention. 

[0023] The general idea on which the invention is based and the workflow used according to the invention is sche- 
matically depicted in Rg. A. The input data, the predictions, as well as the output data are depleted. 

30 [00241 Starting with ttie input data the four different types are shown: test progress data from systematic testing, 
failure data, maturity data and project-specificdata. In addition ttie source forthe data is also illustrated. The predictions 
made with the help of a PETS prototype are briefly listed with ttie main models developed: Rrst Extended Partial 
Redundancy model and Second Extended Partial Redundancy Model. The output of the PETS prototype comprises 
the different static and dynamic estimates on the number of failures occumng during tfest, tiie number of faults remaining 

S5 in ttie software after test, ttie number of faults remaining after executing additional test cases, as well as ttie testing 
effort necessary to reduce to the number of faults to a predefined level. 

p)0251 A main part of ttie invention is ttie development of ttie dttferent partial redundancy models which are tailored 
for systematic testing. 

[0026] In order to consider partial redundancy in code construct execution, ttie basic partial redundancy model de- 
40 fines three states a code construct may take: "untested- (10. "already tested with ttie possibility of being tested again 
in ttie future" (7), and tes^ and eliminated from further consideration" (E). 
[0027] The assumptions of the model are as follows: 

1 . The program under test consists of 6 code constructs. At ttie beginning of testing, all ttiese construcis are in 
^ state U. 

2. Per test case exactiyp constructs are sensitised. 

3. The p constructs are randomly chosen from ttiose constructs residing in state U or in state Tat ttie beginning 
so of the test case execution. 

4. A constant fraction r= A/p (Ar g {0, 1 , .., of tfiose constructs exercised by a test case changes to (or stays 
in) state Tend may be tested again in ttie future. The other constructs are eliminated and take state £. 

S5 [0028] The base partial redundancy model can be extended to include fault detection and correction. This requires 
the distinctton between faulty and correct code construcis. Thus, six different states have to be defined: "untested and 
correcT (l/C), "untested and faulty" (l/f), "tested and connect" (TC), tested and faulty" (TF), "eliminated and con-ect" 
(EC), and "eliminated and faulty" (Ef). Consequentiy, the four assumptions of the basic model have to be refonnuiated: 



5 



10 



EP1420344 A2 

1 . The program under test consists of G code constructs. At the beginning of testing, Uq of these constructs are in 
state UF, the remaining {G-u^ constructs are in state L/C. 

2. Per test case exactly p constructs are sensitised. 

3. The p constnicts are randomly chosen from those, constructs residing in one of the states UF, UC, TF or in state 
TC at the beginning of the test case execution. 

4. A constant fraction r=kfp(k is element of {0, 1 , ... p}) of those constructs exercised by a test case changes to 
(or stays in) state Tand may be tested again in the future. The other constructs are eliminated and take state E 

Two extended partial redundancy models have been derived on the basis of the above assumptions. The 
models differ in the way how the activation probability is treated. 
The first extended partial redundancy model (FEPR) assumes 

15 5. When a code construct at which a fault is located is exercised for the first time, the fault causes a failure with 
the activation probability s (0<s<=1). The fault is then renrKyved instantaneously and perfectly. If no failure occurs 
during the first execution of the code construct, then the fault will not be detected until the end of testing. 

[0029] The second extended partial redundancy model (SEPR) assumes a constant activation probability during 
20 each execution of the faulty code construct: ff. When a code construct at which a fault is located is exercised for the 
first time or repeatedly, the fault causes a failure with a constant activation probability s (0<s<=1). The fault is then 
removed instantaneously and periectty. 

[0030] Both extended partial redundancy models employ the maximum iilcellhood method for estimating a lllcelihood 
function. Both models calculate static and dynamic estimates on the number failures to occur during testing and the 
25 number of faults remaining in the software after test. 

P031 1 For the dynamic estimates FEPR uses test progress^ data in fomn of blacic box coverage as Input, while SEPR 
depends on test progress data in fomi of code coverage data. With respect to the static estimates there exists no 
difference in the necessary input data. 

[0032] The genera! idea of the processing of the maturity data and project-specific data, as well as the development 
of the static estimates is shown in F^. B. Therein the different input data, the main processing component, as well as 
the output is depicted, analogous to Fig. A. 

[0033] In order to allow static estimates which are only based on software process maturity data and project-specific, 
these two types of data are necessary input Besides these data two additional classes are used: test progress data 
and failure data. These two additional types of input data allowed an elaborate analysis (univariate^ultwariate/corre> 
lation analysis) of the data. Thus, the parts of the maturity and projed-speciflc data relevant for the failures could be 
identified and incorporate into the fomrtulas for the static estimates (which is the output of this analysis step). 
[0034] During the development of the statistical models a method was developed to select the most Influential soft- 
ware process maturity infomiation and to iinic them to the test progress and failure occurrence data. 
[0035] To deal with different programming and testing skills, the invention is based on the idea to use a well-known 
software process maturity model to integrate its quantitative measures of the processes into the statistfcal model CM 1^ 
and SPICE are examples for such maturity models, see also Grottice. M.: Process Maturity Model Study, Deliverable 
A.3, Project PETS, 1ST 1999-55017, July 2001 . Using these maturity information, the new statistical model should be 
able to deliver a priori estimations of fault rates in the software, i.e. to generate predictions with few or even no testing 
progress and failure occurrence data available. 

[0036] Unfortunately, gatiiering software maturi^ infonmation from a company to evaluate its CMM or SPICE level 
is a expenshre and timeconsuming task (so-called "assesanent^ which is rarely perfonned by SMEs. So the model 
should be designed to use official assessment data if present, but it should also be capable of using information that 
would be gathered using a much simpler metiiod. Since a software process maturity model like SPICE contains several 
hundred criteria. It was dear that not all of them could be considered. In fact, by analysing a number of experiments 
perfonned during real-life sofhArare development and testing projects, a minimized subset of SPICE subprocesses was 
selected that are signifk»ntfdrfauttdi^butk>n over all SMEs and all possible development projects. In addition project- 
specific data (also called environmental factors) were identified that have an influence on the fault distribution. 
[0037] The SPICE processes assessed within the PETS questionnaire are 
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Identifier 


Process name 




ENG.1.1 


System requirements analysis and design process 
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(continued) 





Identifier 


Process name 


5 


ENG.1.2 


Software requirements analysis process 




ENG.1.3 


Software design process 




ENG.1.4 


Software construction process 




ENG.1.5 


Software integration process 


10 


ENG.1.7 


Systm integration and testing process 




MAN.2 


Project management process 




MAN.3 


Quality management process 


15 


MAN.4 


Risk management process 




SUP.1 


Documentation process 




SUP2 


Configuration management process 




SUP.3 


Quality assurance process 


20 


SUP.4/5 


Verification process / Validation process 




SUP.6 


Joint review process 




SUP.8 


Problem resolution process 


25 


CUS.1 


Acquisition process 




CUSJZ 


Supply process 




CUS.3 


Requirements ellcltation process 


30 


0RG^.1 


Process establishment process 


OR&2.3 


Process fafnprovement process 




ORG.3 


Human resource management process 




0RG.4 


Infrastructure process 


35 


ORG.5 


Measurenrmnt process 




0RG.6 


Reuse process 



[0038] In addition to the SPICE processes data on project-specific details is collected in the PETS questionnaire. 
^ The project-specffic details connprise size of the software program, development effort, skill level of programmers and 
testers, etc. 

[0039] Given the data from several reaHife software development and testing projects, the Influence of the software 
process maturity and project-specific data (also called environmental factors) on the estimated number of inherent 
faults and on the redundancy level was. Univariate analysis, e.g. using Goodman's and Kruskal's y measure (L Klein 

^ und M. Missong, Deskr^tive Statistik, Vorlesungsskript, Unlversltat Ertangen-Numberg, Numberg 2002), was conduct- 
ed, as well a multivariate analyses studied (see section 3.3 below). The zero inflated binomial model was used for 
modelling the redundancy level for predictions generated by the partial redundancy models (see section 3.3.2.2 below). 
[0040] As a result the importance of the SPICE subprocesses ENG.1 .3. ENG.1.4, SUP.3. SUP.4, SUP.8, ORG.3 and 
ORG.4 was detemnined. With respect to the environmental factors a high correlation was detemiined for the develop- 

^ ment mntime performance index DPRI (Ratio of the actual development runtime to the planned development runtime) 
and the percentage of requirements changed after ttie specification phase PRCH. 

[0041] The invention provides methods, devices, computer programs and data carriers as defined in the claims 
whk:h include - alorrn or In combination - the features of: 

^ 1 . using the zero Inflated binomial model for nruxleinng the redundancy level forprcdrctions generated by the partial 

redundancy model (see sectmn 3.3.2.2 betow); 

2. using univariate^multivarialeandconBlation analysis for seleclingthe maturity d^ 
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that have the highest influence on the failure occurrences (see section 3.3 below); 

3. using the maximum likelihood method for estimating a likelihood function for the partial redundancy model (see 
sectton 2.5 betow). 

[0042] The features described above and In the description below are not only essential for the invention in the 
combinations in whtoh they are described but also In other combinations or Isolated. 
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1 1ntroduction 

1.1 Project overview 

[00431 The PETS project is a two-year research and development effort on the part of a consortium of four Industrial 
and one academic bodies, partially funded by the European Commission under the Fifth Frameworic Agreement 
{CRAFT project). The objective of PETS Is to develop a new, enhanced statistical model which predicts the reliability 
of SW programs based on test and software maturity results. This new model will be implemented as a prototype which 
allows small and medium enterprises, especially small software companies, to predict the reliability of their developed 
and tested software in an easy and efficient manner with a higher accuracy than cunently possible. 

1 J2 This document 

(00441 This document consists of two main parts. In the first one (chapter 2), the assumptions of existing software 
failure models concerning structural coverage growth are generalized for the derivation of model that is more appro- 
priate for systematic testing. This partial redundancy model is then extended in two different ways to include the de- 
velopment of the number of failure oocunrences. The model properties and parameter estimation techniques are stud- 
ied. 

[00451 In the second part (chapter 3). parameters of the partial redundancy model are related to software develop- 
ment and test process maturity ratings and to other environmental factors. 

1 ^ Related documents 

[0046] The unifyi ng model frameworic that serves as a starting point of this document was first derived in the Software 
Reliability Model Study (deliverable A.2) [14] and enhanced in more recent articles [12, 16J. As for the assessment of 
software development and test process maturity for the purposes of the PETS projet*. this document builds on wori( 
done in the Software Process Maturity Model Study (deliverable A.3) [13] and in the Justified Model Selection (deliv- 
erable A.4) [1 8]. In the latter document, several of the projects used forthe analysis of associations between parameters 
of the partial redundancy model and environmental factors are described in detail. Furthennore, infonnation on all data 
sets used here can be found in the Baseline Experiment Reports (deliverables C.1 - C.4) [2, 5, 19, 21]. 

2 The partial redundancy models 

[0047] In this chapter, different variations of a software failure model with partial redundancy are developed, and 
their properties are investigated. In the first subsection, the general problem of analyzing failure data collected during 
systematic testing is described briefly. 

2.1 Systematic testing and software reliability models 

[0048] Like the vast majority of industrial organizations, the small and medium enterprises participating at the PETS 
project employ systematic strategies for software testing. One goal of such approaches is to find as many faults as 
possible in the software product within the given time and budget constraints. In order to achieve this, amongst other 
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things, the testers are to minimize repeated execution of code areas. Even ff the code of the software Is not available, 
testing regimes like equivalence partitioning [26, 27] may guide them in defining test mns that are thought to execute 
code blocks or paths not tested before. Furthemiore, special focus is put on input sequences expected to be error 
prone (for example, boundary values of equh^alence classes [26, 27]), even if they shoukJ rarely occur in normal op- 



[0049] Obviously, the failure pattem encountered during systematk: testing isgreatly different from the one a 'normal- 
user would have seen. Therefore, without additional Information on the fault distribution in the software code and the 
execution probabilities in the operational profile and the testers' profile of software usage, the reliability of the software 
(as perceived by a user) cannot be detennined based on failure data collected during systematic testing. However, 
predicting the number of failure occurrences until the end of the testing phase or the number of faults remaining in the 
software should be possible. 

[0050] Practically all of the existing software reliability growth models implfcitly assume that the failure data analyzed 
were collected during operational testing. Not only does this proposition ensure that the failure pattem experienced is 
related to typical software usage, but the shapes of the mean value functions of these models also rely on this as- 
sumption. Therefore, applying a classical software reliability growth model to failure data collected during systematk: 
testing can hardly be expected to yield a good fit and/or trustworthy parameter estimates. Investigating how the shape 
of a classteal model is related to operational testing may suggest how to adapt the model In order to use it In environ- 
ments where systematb testing is employed. 

[0051] According to the model frameworic first developed in the Software Reliability Model Study [14] and refined In 
more recent woric [12. 15], a number of finite failures category models can be interpreted in terms of four consecutive 
relationships: 

I. The allocation of testing effort f to calendar time t 

II. The development of test case coverage ^ as a function of cumulative testing effort 

III. The expected coverage of code constructs k = £(Q attained through test case coverage 

IV. The relationship betwe^ expected structural coverage and the expected number of failures experienced \i 

PK)52] Comparingoperationalandsystenrmttetesting, especiallytheway in which code construct coverage is attained 
by the different testing technk^ues seems to differ a tot. Therefore, relationship III is explored further in the following 
section. 

2J2 Phwowarski et al. versus Rivers and Vbuk 

[0053] Both the block coverage model by PlwowarsW et al. and the hypergeometric model for systematte testing by 
Rivers and Vouk have already been discussed in the Software Reliability Model Study [14J. Generalizing the block 
coverage concept to a model of code construct coverage, PiwowarskI et al. [36] make the following assumptions with 
respect to relationship III: ' 

1 . The program under test consists of G code constructs. 

2. Per test case, p of these constructs are sensitized on average. 

3. The p code constructs are always chosen from the entire population. The fad that a constmct has already been 
sensitized does not diminish its chances of being sensitized In the future. 

[W)54] This setup resembles operattonal testing with a homogeneous operational profile, in which all equally-sized 
operations have the same occurrence probability. As shown In [14], expected code coverage as function of the number 
of test cases executed takes an exponential forni closely related to the Jelinski-Moranda model or the Goel-Okumoto 



[0055] The central proposltton of the Rivers-Vouk model [37. 38] as one of the few software failure models explicitly 



eratton. 



model: 




(1) 
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designed for systematfc testing is that repeated executions of code constructs are completely avoided. While the first 
two assunnptlons of the approach by Plwowarski et al. apply if the test cases are equally^ized, the third one has to be 
changed as follows: 

3. Code constructs which have already been exercised are not tested a second time. Consequently, according 
to the Rivers-Voui< model relationship III is not stochastic but detemiinlstic, and It is nothing but a straight line: 



(Due to this assumed proportionality, Rivers and Vouk equivalently use the number of test cases executed or the 
number of code constructs covered as the model input.) 

[0056] The two shapes of code coverage as a function of the number of test cases executed are depicted in figure 
1 . Since the two models are concemed with the extreme cases of perfect avoidance of re-execution of code constructs 
and full potential redundancy, it seems reasonable to try to integrate them in a more general model that also allows 
for the Intennediate cases of partial redundancy in sampling code constructs. This model could brieve the gap between 
approaches for systematic testing and operational testing. 



Figure 1: Siruchsnd coverage gmuth in the approach by Piwowarski et oL and in the 

Rivers- Vonk model 

[0057] Such a model will be developed in the next subsections. 
2^ The basic partial redundancy model 

[0058] In order to consider partial redundancy in code construct execution, It seems useful to define three different 
states code constructs may take: "untested" (U), "already tested with the possibility of being tested again in the future" 
(7) and "tested and eliminated from further consideration" (£). The assumptions of the model are as follows: 

1 . The program under test consists of 6 code constmcte. At the beginning of testing, all these construes are in 
state a 

2. Per test case, exactly p constructs are sensitized. 

3. The p constructs are randomly chosen from those constructs residing In state U or in state Tat the beginning 
of the test case execution. 

4. A constant fraction r = - (k G {0,1, ...,p}) of those constructs exercised by a test case changes to (or stays in) 
state Tand may be tested again in the future. The other constructs are eliminated and take state £ 

By formulating the assumptions slightly more rigorous than the propositions of the Rivers-Vouk model and the approach 
by Piwowarski et al. (e.g. by proposing that the number of code constructs executed per test case is exactly equal to 
p), it will be possible not only to derive expected code construct coverage after / test cases, ic(<), but the f uil probability 
distribution of the number of code constructs sensitized after / test cases, 

[0059] The setup illustrated above can be viewed as a discrete-time Mari<ovian population model, more specific as 
a variation of the vector Maricov process described by Howard [22]. While the state of the entire sofhvare is given by 
the number of untested, tested and eliminated code constructs, the behavior of each unit (the code constructs) is 
governed by the same Maricov process; at the "atomic" test case executions a code construct may switch from one of 



K(/) = c(/) 
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the states U. Tand E to another one. In figure 2. these states and the corresponding transition probabilities for the P* 
test case are depicted. 




o o o 

Figure 2: Markov graph of the basic partial redundancy model 

While r stands for 1 -r, z, denotes the selection probability for each of the constructs in state U and state T during the 

test case. Defining G^^, asthenumberof code constmcls In state >l(>l = {(J 71£}) after execution of the (/- 1)tf»test 
case, this probability Is 



Obviously, zi and the related transition probabilities are not only tinje-variant, they also depend on the condition of all 
code constructs. In contrast to the systenns dtecussed in [22], the units are not acted upon independently. 

2^.1 Approximated eixpected structural coverage growth 

ipoeoi We are interested in ^^T^^^E ^ f , the expected code coverage achieved as a function of the number of test 
cases executed, /. ® 

[0061] Due to the structure of the model, it is possible to fonnulate the expected number of code constmcts in the 
. three states in a recursive way: 



/ E{Gu.i) \ / 1 - C< 0 0 \ / EiGu,i-i) \ 

where denotes the selection probability during the Attest case, given the (/- test case resulted in the expected 
allocation of the G code constructs to the three states, i.e., 



^' £|Gy^,he|Gr>ir 

From equation (3) and the initial conditions eitGy.o)= G. ^Gr,o)= £(G£^ = 0. the continuously approxfcnatedstnjctural 
coverage function k{H can be derived [12, 15): 



«(<)=/ l-(l-§(l-r)0'^ ifO<r<laiidt<[§(l-r)] 
\l-exp(-g.») ifr = l ^ 



(4) 



This result confirms that the partial redundancy model is a generalization of the approaches by Piwowarski et al. and 
by Rhrers and Vouk: For 1 , i.e.. if constmcts are never eliminated, the exponential model derived by Piwowarski et 
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al. is obtained. Assuming perfect avoidance of redundancy (r= 0), on the other hand, leads to a linear relationship 
between the number of test cases executed and code coverage achieved, like in the setup proposed by Rivers and 
Vouk. Apart from these extremes, the model also contains the more realistfc cases of partial redundancy in code 
construct sampling. 

2.3.2 The distribution of the number of code constructs covered 

[0062] While the functional fonri of the approxin^ated structural coverage growth Is one important characteristic of 
the partial redundancy model, the distribution of -Lfl-£f around its expected value is also of interest. Of course, 
equivaiently the probability mass function of Qf = Gjj ? G^i can be discussed. 

[0063] Restricting the analysis to the special case r= 1 (i.e.. to full potential redundancy as In the approach by 
Piwowarski et al.), the model setup can be formulated in terms of an urn model: Let each of the G code constructs be 
represented by an urn. During one test case execution, p different urns are randomly selected from all G urns, and a 
bail is put Into each of these chosen ums. What is.the probability that the number of urns containing at least one ball 
Qi, is exactly qp 

[0064] The case p = 1 Is identk»i to the ciassk»i occupancy problem [25J, while the case p = 2 was discussed by 

FellerpJastheWomosomeprobiemritseemsthatthe general case withptaking any Integervalue was first analyzed 
by Mantel and Pastemack[33], who dubbed itthe "committee problem". Whilethey derived the probability mass function 
of O/ by induction, Sprott [39] used the induslon-excluslon principle for the same end. Since his method can easily be 
adapted for discussing the baste partial redundancy model with 0 5 r< 1 , it will be described in the following paragraphs 
[0065] Defining the events >li,>l2 

Al := "The construct is nof executed by any of the test cases", 

the probability of executed code constructs after / test cases is equal to the probability of the occurrence of exactly 
G - of these events. A general formula for calculating the probability of the realization of m among N events is given 
by Feller [8] as 

EM)'('":')^-"f(-iy ("■;') w (5) 

^'^^ ^iw-y denoting the sum of the probabilities of the occurrence of at least (m +/) events, l.e., 
[0O66] Applying equation (5) to our problem yields [39] 



jsO \ J / 

[0067] The probabiltty of (at least) G-q^hJ specmed constructs not being sensitized by any of the / test 
equal to 

[0068] Since the G - 9/ +7 specified constructs can be chosen from all Gconstructs, S<3.a^y is the sum of 
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of such probabilities: 



[0069] Due to the reiattonship 

= ©(?) 

combining (6) and (8) results in the probability nnass function 



or, setting k=q^-l in 



Equation (6) can also be used as the starting point for deriving P(0,= in the general basic partial redundancy model 
with 0 ^ 1 . However, fonnulating S^^j is a more complicated task. 

[0070] The number of possible ways to choose p code constmcte and eliminate p(1 - 1) of them per test case Is 

(:)U-.)c-i'-")U-.)-r-'^'7"'-")u^,) = 

This value is equal to zero if p > 6 - p(1 - - 1 ). 
[QOTI] There are exactly 

different possibilities for doing this such that (at least) G-qf*^/ specified events Af occur, i.e., such that G - /specified 
code constmdB are not covered. For/> q/ - p(1 - - 1 ) - P this value is equal to zero. 
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[0072] Consequently, the occurrence probability of (at leasQ G-Qi+J specifled events is 



This probability is only defined forp^ G - p(1 - /)(/- 1), 
[0073] Since there are 

different ways to choose G - q/ +; constructs out of all G constructs, the probability sum Is given by 



with G - p(1 - /)(/- 1). For; > q, - p(1 - /)(/- 1) - pthis sum Is equal to zero. 

[00741 Combining equations (6) and (13) finally yields the probability mass function for the general basic partial 
redundancy model: 



PiQi=qi)= (14) 

--|r-'t-i""(3*T'"'-'''&)Lne'-r-''0l- 

with p^G-p(1 -!)(/- 1). 

[QOTSl While this analytical closed fomi expression for the probability mass function can easily be recognized as a 
generalization of the one connected with the committee problem and while it can be used for determining the exact 
exported value of (cf. secUon 2.3.3), its implementation produces inregular results: Due to numerical problems 
caused by the alternating signs and the size of the probability sums 5©^,;+/ some of the probabilities calculated can be 
above one or below zero. An example is shown in figure 3. The full range of the probability mass function of Q, for G 
= 100, p= 10. /= 4 and r= 1 is depicted in the left diagram. Since this plot Is dominated by the two "probabilities" for 
q;= 39 and q,= 40 that are completely out of bounds, in the right diagram the range is reduced to between zero and 
thirty-eight. 
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Figure 3: Probability mass functions of P{Qi = ft) forG^ 100, p = 10, i = 4 and r = 1 
ca/ctt2ated via dosedrform expression (|) and recursiv,dy (o) 



Generalizing the suggestion by Finkeistein et aJ. [9] to implement a recursive fomnula for P(Oy = for the classical 
occupancy problem (I.e., for the spectal case p = 1 and r= 1) and based on the related code by Tucker [431, the 
recursive equation 



P fG-(«~i)-p(i-r)(t~i)\/^-j\ 

m = ft) = E - — = ft - i) 

i=o I p ; 
has been programmed in C together with the initial condition 



[0076] The probabilities calculated with this equation for the case G= 1 00, p= 1 0, /= 4 and r= 1 are shown as circles 
in the right diagram of figure 3. Obviously, the dosed-from expression (1 0) produces apt probability masses for smaller 
values of however, at q/= 37 the numerical problems begin to show. 

[0077] To illustrate the influence of the redundancy level r on the distribution of Oj, figure 4 depfcts the recursively 
derived probabflity mass functions P(0,= for G= 100, p = 10, /= 10 and four different values of r. As expected, 
increasing the redundancy in sampling code constructs shifts the entire distribution to the left: The probabflity that at 
least qi different code constructs are executed decreases for all possible values of q^ 

[00781 In addition, the variance of Qgets higher. This property is also reasonable: The more construds are replaced 
after each test case the less restriction there Is as for whfch constructs can be sensitized next. In the spedal case of 
no redundancy (r= 0), the setup connected to the Rwers-Vouk model, the probability mass function degenerates to 
one point; after / test cases the probability for having covered p . /code constructs Is equal to 100 per cent. 
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r=0.1 r=0.4 




Figure 4: Probobmg mass Jmctums ofP{Qi = for G = 100, p = 10, t = 10 and various 
^ redundancy levels r 

2^.3 Exact expected structural coverage growth 

[0079] While the dosed-fonm expression (14) was inadequate for implementation, it can be used for deriving the 
^ exact expected value of C% as opposed to its continuously approximated version 



B{Q{i)) = C7«(i) = / ^ [l - (1 - - if 0 < r < 1 and < < [§(1 - r)] 

50 I G[l-exp(-g.i)] ifr = l 

following from (4). 

[0080] A convenient way to do so is via the concept of factorial series distributions. According to Johnson and Kotz 
[25] a proper discrete protiability mass function that can be written in the form 

55 
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5 

With 



15 



AV(o) =iE- irm = E 0) (-^yn^ - j) 

is called a "factorial series distribution". Its expected value is then given by 

m L m J" 



so [00811 Choosing e = Gand 



/(ft)=imri,*-'^. 

25 the distribution of equation (14), can be shown to be a factorial series distribution: 

= ir-'';-'")]""Q§(4)|n("-r-i 

Consequently, the exact expected value of 0| in the basic partial redundancy model with 0 ^ 1 is 

— ['-^l-hn(-^)]- 



35 



40 



2.4 Extended partial redundancy models 

45 [0082] The basic partial redundancy model discussed in the last section can be extended to include fault detection 
and correction. This requires the distinction between faulty and correct code constructs. Let there be six different states 
a code construct may take: "untested and correcT (UC), "untested and faulty" (UF), "tested and correcT (TC), "tested 
and faulty" (77=). "eliminated and correcT (EQ and "eliminated and faulty" (Ef). Consequently, the four assumptions 
of the basic model have to be refonfnulated as follows: 



50 



55 



1 . The program under test consists of G code constructs. At the beginning of testing, of these constructs are in 
state t/F(i.e., they are untested and faulty); the remaining (G - constructs are in state UC. 

2. Per test case, exactly p constructs are sensitized. 

3. The p constructs are randomly chosen from those constmcts residing in one of the states UF, UC, TF or TC at 
the beginning of the test case execution. 
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4. On average , a constant fraction r= - (Af G {0, 1 p}) of those constructs exercised by a test case may be tested 

again in the future. If such a construct is faulty after the test case execution, it changes to (or stays in) state TF\ if 
It Is correct after the test case execution, It moves to (or remains In) state TC. The other constructs are eliminated 
and take state EF or state EC, respectively. 

[0083] The rephrasing of the assumptions lays the foundation for the model extension, but it does not change ex- 
pected code coverage growth. Equation (4) still holds true. However, these assumptions are not sufficient for deter- 
mining the distribution of M; = - G^jpj - Gj^j - G^j, the number of failure occurrences as a function of the number 
of test cases executed, as well as its (continuously approximated) expected value. Additional assumptions have to be 
made. 

2.4.1 The first extended partial redundancy model 

[0084] Several existing models, e.g. the ENHPP framework by Gokhale et al. [1 0]; suggest that executing a faulty 
construct may only result In a failure occurrence if the construct Is exercised for the first time. Within the context of the 
partial redundancy models such a proposition can be formulated as follows: 

5. When a code construct at which a fault Is located Is exercised for the first time, the fault causes a failure with 
activation probability s (0 < s^ 1). The fault is then removed instantaneously and perfectly. If no failure occurs during 
the first execution of the code construct, then the fault will not be detected until the end of testing. 
[0085] In the resulting Markov graph of this model shown in figure 6 self loops have been omitted in order to maintain 
a less cluttered diagram. 



[0086] The transition probabilities depend on i;s and z The superscript e/ of the selection probability indicates that 
it is linked to the first extended partial redundancy moddl in opposltton to Z|0f the basb model it is the chance of each 
individual conslmct in one of the states UC, TC, l/Fand TFof being selected during the test case: 



2.4. 1. 1 The approximated numi)er of expeded failure occurrences 

[0087] Again, the expected number of code constructs in the different states can be written in recursh^ f onn, 




Figure 5: Markov grtxph of the first extended partial redundancy model 




19 



EP1420 344 A2 



10 



/ EiGvci) ^ 




f 




0 


0 


0 


0 


0\ 


E{GuF,i) 






0 




0 


0 


0 


0 


E{GTC,i) 






C'- 






0 


0 


0 


E(GTF,i) 






0 


a'rs 


0 


1-C'r 


0 


0 


E{GBc,i) 












0 


1 


0 


\ E{GBF.i) J 






0 


C'fs 


0 




0 


1/ 



.J5(Gi5rc,.*-i) 



> (15) 



C standing for the selection probability z ' with all random variables being replaced by their respective expected 
vklues, l.e., ' 
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[0088] Using equation (1 5) and the Initial conditions ElGy^o) = ^^^i/c.o) = G - tj^ and qCTyro) = B^Gjcsi = E 
(GEFjcd = ^Gfco) = 0. It has been shown that the expected number of failure occurrences during the first /test cases, 
can be continuously approximated by the following mean value function [12, 15]: 

Mi) = t^snit) = / - (1 " §(1 - 0<) ^] if 0 < r < 1 aad i < [§(1 « r)] 

{ ties [l-cxp if ^ = 1 

[0089] This result fits well with Intuition: If fault detection Is possible at the first execution of a faulty construct only, 
then the number of failures experienced is proportional to the expected level of code coverage attained. The constant 
of proportionality, q^s can be interpreted as the expected number of detectable faults in the software at the beginning 
of testing. 



2.4, 1.2 The dSstribuHon oi the nuna)er of iaaure occurrences 

[0090] In order to cause a failure, a faulty construct has to be executed. Since only the first execution of a faulty 
35 construct may result in the detection of the ^lure, the distribution of M, depends on the distribution of the number of 
faulty code constructe executed at least once during the first /test cases. Let this value be denoted by the random 
variable X,^ Its probability mass function can be determined following the same line of thoughts as In section 2.3.2. 
The event X/^=x,; is equivalent to the occunrence of exactly Uq-x,^ of the following q . 



Bf := "The /^ foully con^ruct is not ^ecuted by any of the test ( 
[0091] In the case of % events the general Inclusion-exclusion equation (5) takes the form 



j-O \ J / 



(17) 



50 For calculating the probability sums S^^^the probability of the occurrence of (at least) lib - ^// + / specific events 
has to be derived. While the number of ways for selecting p code constmcts per test case, each time replacing prof 
them, is given by equation (11). the number of ways for doing this such that %'X,j^ /specific faulty constructs are 
not executed is equal to 
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- (tio - Xt4 + jy^ ^ p ^ _ (tio _ xt,i + j) -p(l - r)^ ^ p ^ 
^/G-(tio-X/.i + j)-p(l-r)(t-m/ P \ 

which takes the value zero if /> G- ti^ + Xf f-p(1 -/)(/. 1) - p. 

[0092] Consequently, the probability for this result - defined only for p ^ G - p(1 - 1) (/ - 1 ) - is 

[n (''-<--'«;^)-^-')')J [n (<'-'<;-'>')]"'. (.„ 

and the probability sum is ttien 



\-V= (20) 



[0093] From equations (1 7) and (20) follows the probability naass function of X 



-Sh(T)nf-'""-"'r'-'*"'")] = 



P(1 - - 1 ) + P - (G - Ma) is larger than zero, then the summation in this equation only has to be done for value of j 
smaller than or equal to G - + - p(1 - 4(/ - 1 ) - p or values of k larger than or equal to p{1 - /)(/ - 1 ) + p - (G - uqI 
respectively. * 
[0094] In the case of perfect fault identlTicatlon, i.e., if the activation probability s is equal to one, the (first) execution 
of a faulty always results in a failure occumence. Since this means that Mf is identical with X, * the distribution of the 
number of failures experienced during the first / test cases is then given by equation (22). 

[0095] More general, for 0 < s ^ 1 , ghren the number of faulty code constructs covered up to the f''' test case the 
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number of failure occurrences follows a binomial distribution of size x, , with success probability s. Therefore, the prob- 
ability mass f uneven of M/ can be calculated as 



ttO 



P{Mi=mi) = ^ W = mi|x/.,)F(X/.< = a:,,0 = 



tiO 



= E ('y)^'^ii-^r''-'^PiXi, = x.,,). (23) 
[0096] Combining this relationship with with equation (22) one obtains the closed form expression 

[0097] However, as could be expected from the experience with the probability mass function of 0,in the basic model, 
the Implementation of this equation shows numerical problems connected to the relative size of the tenns and the 
altemating signs. Therefore, the probabilities fl(Xy = x,} used in equation (23) are actually determined recursh/ely via 
the starting probabilities 

P{Xi^ == x/,i) = for Zi^i = 0, 1, miii(p, uq) 

and the relationship 

[0098] For the simple example of a software consisting of 100 code constructs and a test case size of 10 code 
constructs, the probability mass functions resulting from different values of r and s are visualized in figure 6. 
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ProbabUity mass functions ofP{Mi = wij) accordir^ to the first extended model 
= 100, p = 10, uo 30, i = 10 and variotis Tedumldncy levels r and activation 

probabilities s 



[0099] While both increasing s and lowering r shifts the distrtoution to the right, the influence of the activation prob- 
ability is much higher. 

2,4,1.3 The exact number of expected fEuture occurrences 
[0100] Setting e=^ lib and 



the probability mass function of Xj^can t>e written as a factorial series distribution: 
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[0101] Therefore, the exact expected value of X,| is 
and for the expected value of Al/ follows: 



t-1 

1 



-n('-g^)]=^w.) (24, 



[0102J Just like the mean value function, which Is proportional to the continuously approximated expected code 
coverage function, the exact expected value of the number of failure occun^nces is proportional to the exact expected 
value of the number of executed code constructs. 

2.4^ The second «rtended partial redundancy model . 

[0103] Since a code construct can be exercised with input variables taking different values, it is possible that the re- 
executlon of construct whfch are still faul^ may cause a failure. To account for that, assumption 5 of the first extended 
30 model is replaced by the folk>wing one: 

5. When a code construct at whfch a fault Is located is exercised for the first time or repeatedly, the fault causes 
a failure with constant activation probability s (0 < s ^ 1 ). 
[01 04] The f auit is then removed Instantaneously and perfectly. 

[01 05] This proposition adds two more transiltons to the Markov graph of the model (cf . figure 7). 
35 [0106] As the code constructs executed during a te^ case are taken from the states UC, TC, UFand 7F, like In the 
first extended partial redundancy model, the selection probability z^* is equal to z*. 

[0107] If the execution of a faulty construct always results in a fdlure, or if eacf/ construct is eliminated after being 
tested for the first time, then the state TF cannot be taken by any construct Therefore, for eittier s = 1 or r = 0 the 
second extended model is effecth^ kJentteal to the first extended model. 
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20 Figure 7: Markov graph of the second extended partial redundancy model 

2A^.1 The approsdmated mmber of expected f^hire occurrences 

[0108] Forthismodelvariation, the recursh« relationship forthee^ 
^5 takes the form 
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with tf" = From this equation and the initial conditions, which are the same as for the first extended partial redun- 
dancy mod6l, follows the mean value function [12, 15] 
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^(i) = / "oi^ [l - (1 - g(l if 0 < r < 1 andi < [§(1 --r)]"' 

I 110 [l-exp ifr=l 



(25) 



Clearly, for r> 0 and s < 1. n(/) is not proportional to k{i). Rather, as shown in [12. 15], the derivative of ji(/yK(^ with 
respect to i is then larger than zero. This means that as testing proceeds relatively more failures occur per percentage 
of newly gained code construct coverage. Furthermore, the derivative of the expected number of detectable faults, 
MoTj;— . with respect to r is larger than zero if s < 1 . The higher the degree of redundancy, the more of the faults 
imrnanent in the software at the beginning of testing will be detected on average. The reason for these properties of 
the second model variation lies in the fact that it allows for positive effects of redundancies in sampling code constructs: 
Faults in those constructs already covered before may still be detected. 

[0109] For 5 = 1 or r 0. equation (25) reduces to the mean value function of the first extended model (16), as 
expected. 

2.4^2 The cBstOjution of the number Mure occurrences 

pi 10] If no constnjcts are eliminated, I.e., for a redundancy level of one. the stmcture of the second extended model 

Is simple enough for deriving a closed fomn expression of the probability mass function P(Mi = /w^. 

[01 1 1] Since a certain fault is not detected either if the respective constmct is not executed or If it is executed without 
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activating the fault, the probability for U(,-m,+ J specified fau^ not causing a failure during one test case Is 



[0112] The probability sum for at least - /n/+ /) faults not detected by the first /test cases Is then 
[01 1 3] Combining this with the adapted version of the inclusion-exclusion equation (5), 

jssO \ J / 

one obtains the probability mass function 

.g{<-<7)[~r(-roa-.r(-<::r-')]]= 

= (:)©l{<-'-©[sr:0<--)"r;-:')j] 

[01141 For the special case of a program consisting of faulty constructs only, I.e., ufe = G, this equation becomes the 
one of the randomized committee problem discussed by Sprott [39]. 

[0115] Let the random variable AX/y denote the number of faulty code constructs sensitized by the /''test case. (In 
contrast, the random variable AXy used when discussing the first extended model refenred to the number of faulty 
constructs executed for Ihe first Wne during the test case.) Then a recurshre f onriulation 6f equation (26) is 

P(M, = »^) = E E ^" • (ar"^"' ^ (27) 

(X^)"^^^ ~ = - Arm) 

with the initial probabilities 



for /Wi = 0,1, .... min(p, i%). 

[01 1 61 A redundancy level between zero and one complicates the relationships further. Indeed, so far a closed form 
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expression of P(Af,= mlf could not be derived for this case. However, a recursive equation can be formulated. With the 
random variable S^.^ denoting the number of faulty code constructs either corrected or eliminated during the first /- 1 
test cases, for a given value the joint probability distribution of AAf/ and AX„^ is 

/t»o-ei-iW<?-p(l-r)(f-l)-tto+^,_A 



X (^^•'j^'^a - 5)^".-^ (28) 



[0117] Intum, A^^isthesum of the number of faults detected, Am^ plusthe numberof still faulty constructs eliminated 
during the ^ test case. Therefore, given Am, and Ax/, , the conditional probability of AS, taking the value Am; !S Ag, ^ 
AXff i is equal to the probability of eliminating exactly A^Am; (and, consequently, of replacing exactly Ax>,/-Ay of the 
AXffj-Anif faulty constructs remaining in the sample chosen by the test case: 

for A^/= Am,^ Am|+ 1 , Ax/gj. 

[0118] Multiplying equations (28) and (29) and summing over Axj^i results in 

P /A*//,<-Ami\ /fH-Ax/,,i4-Ami\ 

jc-p(i-»)(*-i)j 

which can be used In the recursive equation 

P{Mi = m<, 5i = ^i) = (30) 
p p 

Amj=0 A{i=Ami 

with starting probabilities 



F(M,=mi,Si=6) = J; 



p -mi \ /p-Ax//.i4ini \ 
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for calculating the joint probability function of M/ and S/. The last line of equation (30) holds true, because both Mf and 
S, only depend on M,.^ through S^^. How many faults can still be activated and eliminated directly depends on the 
number of faulty constructs to be tested in the future. 
[01 19] From equation (30) finally follows the probability mass function of Mf, 



(ismi AmisO A^i=s Ami Ax/f ,i=A{« 

tio-e<-i\ fC-p(l-r)(i-l)-tio-K<-i 
15 xAx//,, p-Ax/f,< / # I Am, /I _ ,xAx,/.|-Am< 



^tio-e<-iUC?-p(l-r)(i-l)-tio-K<-i\ V 
V Ax,,,, ; ^ p.Ax/,,, ) f Am. _ 

(C-p(l-r)(«-l)J ^^Ami^ ^ ^ 

xP(Af<-i = m< - Ami, = 6 - Aft). (32) 



[0120] For the special case r= 1 discussed above the random variables M; and 2/ are Identical, and equation (32) 
reduces to equation (27). 

[0121] In figure 8 the probability distributions calculated recursively for the setup discussed as an example in se<Alon 
2.4.1 2 are depicted. There Is no big difference to the probability mass functions implied by the first extended partial 
redundancy model shown in figure 6. However, for r = 0.9 the positive effects of redundancy taken into account by the 
2s second extended partial redundancy nwdel are noticeable in the fomi of a slight shift of the distributions of RAf,= mj 
to the right. 

2,4.2,3 The exact number of expected taiture occurrences 

30 [0122] The probability mass function (26) obtained for r= 1 is a factorial series distribution, which can be shown by 
setting 6 = 1^ and 



/K) - [est cvja - i2r)]': 

[0(1 -»)"©]' 



/(«o) 

uo\ E,^ [t:^^' - sri^'-^^-i)]" 
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Figure 8: ProbabiUty mass functions ofP{Mi = m^) according to the second extended model 
for G = 100, p == 10, uo = 30, i = 10 and various redundancy levels r and activation 

probobUUies s 

[0123] Consequently, the exact expected value of Af/ls given by 



(33) 



2J5 Model estinnatlon 



[0124] Besides the discussion of the mean value functions, the probability mass functions, etc. following from the 
model assumptions the question of how to estimate the model parameters for a given data set is of high Importance, 
tn this section, several estimation methods are discussed. 
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2.5.1 Least squares estimation 

2.5. 1. 1 FiWng the cumulatiye number of faUure occurrences 

[01 25] Like the partial redundancy models the model by Tohma et al. [24, 41 , 42] as well as the RIvere-Vouk model 
[37. 38] are based on the hypergeometric distribution. For both models there have been suggestions to estimate the 
parameters by fitting the mean value functions to the observed cumulath^ number of failure occurrences via the least 
squares method [24, 37]. 

[0126] Accordingly, the function to be minimized in order to detemiine the parameter estimates is the sum of the 
squared deviations of the cumulative number of failures experienced by the ^ test case from the respective expected 
values £IAI^ - or. as an approximation, from the value of mean value function \i(f). Table 1 lists the functional forms 
obtained for both cases and depending on the value of r. They are the same for the two extended partial redundancy 
models - merely the interpretation of the parameters a. p and y is different, as shown In table 2. 
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Table 1: Fkim^imal forms obtained for the sum of squared errors using E{Mi) and 



Table 2: 
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Interpretation of the identifiable parameters 
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35 [0127] Several conclusions can be drawn from thfe way of looking at the sum of squared en-ors: 

• Using the least squares approach, not all of the parameters of the partial redundancy models can be Identified, 
but only the compound parameters a. p and - where applicable - y, or other combinations of them. 



Since based on these identifiable compound parameters the structure of the sum of squared errors is the same 
for the first and the second extended model, it is not possible to distinguish between the two models. Only the 
interpretation of the compound parameters is different. 

So far, It could be shown that the functional fomi given for the sum of squared enters based on the exact expected 
value of Mf under the condition 0 £ r< 1 holds true for the first extended partial redundancy model only. However, 
based on the structure that shows in the tables 1 and 2. one may conclude that in the second extended model E 
(AQ is given by 



l-r + r5 [ 11 V G-p(l-r)/;j' 

[01 28J From a statistician's point of view, fitting the cumulative number of failure occu rrences is problematic for several 
55 reasons, whfch are ail conna^tothefactthat each random variable M,is the sum of the number of failure occurrences 
within the first (/ - 1 ) test cases. Mf,^ . and the number of failures experienced during the i^ test case. AM/ 

1 . One may well expect that the variance of the Is not constant but increasing with /. As a consequence of this 
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heteroscedasticity the least squares estimators are still unbiased, but inefficient [11]. 

2. Moreover, the disturbances C/ In the model Mf = E(/W/)+e, are con-eiated. The effects of this autocorrelation are 
the sanne as for heteroscedasticity [11]. 

3. The fact that both /and m, are trended series also gives rise to the spurious regression problem due to which 
the significance of the assumed relationship between the two series is overestimated [11]. 

[0129] Therefore, least squares estimation should rather be done based oh the number of failures observed during 
each individual test case. 

2,5. 1,2 Fitting the mmberofMure occurrences per test case 

[0130] Several publications discussing the hypergeometric model by Tohma et ai. endorse minimizing the sum of 
squares of the deviations of Am, from their expected values. While Dohi et al. [4] do not give any detailed Information, 
Tohma et al. [42] deariy use the conditional expected values given the number of failure occurrences in the previous 
test cases for calculating the sum of squares 

[Aim - E(AMi I Amu Anii-i)]' . 

i 

[0131] For the first extended model, 

£;(AMi|Ami....,Ami.O = lr)(i - 1) ^"^^ " I ^^i' Am,-i)) = 

_ ^ P r 

~ ^G-p(l-r)(i-l)V'^"1"J> 



which leads to the sum of squared errors 



(34) 



with parameters a, p, yequal to those used in the least squares estimation based on the cumulative number of feilure 
occurrences, listed In the second line of table 2. 

[0132] However, directly applying equation (34) may yield inadequate results: in general, there are many test cases 
which do not activate any faults. Therefore, the minimum sum of squared en^rs is often obtained for p = 

= 0! One remedy, avoiding the domination of the non-failure test cases in the parameter estimation, is to group the test 
cas^ as follows: Let the set/ consist of the ordered numbers of those test cases for whfch at least one failure occurred, 
i e.,/ = {<t,i2 lit = lil Am/^ {1,2, ...» with ii < <2 < < iic Then calculate the sum of squared errors as 



k 

E 



(35) 



[0133] For the second extended model, the conditional expected value of Mi is given by 
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E(AMf I Am,. .... Am^,) = 3^--^^^-^^ (Uo'E{Zf_^\Am, A/n^,)), 

and the complexity of £(S^ilAmi,...,Am^i) defers the derivation of the squared sum of en-ors. 
[0134] One drawback of this estimation approach Is that the disturbances in the model 



can only take Integer values. Therefore. It is not reasonable to assume that they foltow a normal distribution. Conse- 
quently, the distribution of the estimators Is unknown, and confidence intervals for estimated and predated values 
cannot be cateulated. 



2.5.2 Klaximum likelihood estimation 



2.5.2. 1 1nterpreting the partial ledundancy mod^ as NHPP models 

[01351 In the Justified Model Selection [18] It has been suggested to use the continuously approximated expected 
value of Mf obtained for the Rivers-Vouk model as the mean value function of a non-homogenous Poisson process 
model. Interestingly, the predictability (measured in temis of the short temi and long term absolute relative errors) 
increased as compared to the least squares estimation based on the cumulative number of failure occurrences. 
[01361 Uke the Rivers-Vouk model the partial redundancy models are based on the hypergeometric distribution. It 
therefore seems reasonable to apply the same heuristic approach in their context. 

[01371 With 4j ii denoting the ordered set of test case numbers for which measurements are available and 

Am^=m^' m^^ representing the number of failure occurrences since the last measurement, the log-likelihood function 
takes the general forrh 



In/: (^?;{(ti, Ami),. »,(»,, Ami)}) = [Am*, In (/ife) - Mv-x))] -/ife) -X^MAmi,!). 
01381 The log-likefihood resulting for the two extended partial redundancy models is then 



In £ (a, A 7; {(»!, Ami) , Ami)}) = 



}-a[l-exp(-)K,)]-hC7 ifi = 0 



{E5=i {Ami, In [a€xp(^)3ij._i) - aexp{^pij)] 
E!-i ^ [^(1 - - a(l - 7i^.)f ] j ^ a [l - (1 - 7i|)f ] + C 



with C= -Z^^ HAmf}) being a value that does not depend on any of the model parameters. Just like with least squares 
estimatfon, not all of the parameters of the models but only the compound paranrteters a. p. y explained in table 2 (and 
f unctk>ns of them) can be estimated. 

2.5.2.2 Maximisng the HkeGhood tnpSed ty the model setup 

[0139] Instead of conducting a maximum likelihood estimation based on the likelihoods following under the heuristic 
approach of interpreting the partial redundancy nnodels as non-homogeneous Poisson processes, one can directly use 
the likelihoods connected to the model setup. However, their complexity may make this method unfeasble for certain 
model variants. 

[01401 Analyzlngthespedalcaseofcertainfaultaclivation(s=1).forwhkrfithetwoexle^ 

the probability distribution <rf the number of failure occun^ences during the test case given the history of failures 

^erienced Is simply hypergeometric: 
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(36) 



[0141] If the number of faults detected is available for each individual test case, i.e., If {/,, i^. /J = {i , 2, .... A, then 
the log-likellhood function is 

II ^tio-m<_iWC-tio+TiH_i-p(l-r){i-l)\V 



= ^ In((tio - mi^i)\) - J2 H^rml) ~ XI •^(("o " + 

I I 

+ - tio + mi-x -iKl - - 1))!) - J] In((p - Am.)!) - 

« I 

- '"((^ ^ tio + - p(l « r)(i 1) - p)!) hi((G - p{l - r)(i - 1))!) + 
I i 

+ X;I^(pO + "E^iiG-^pil - r)(i 1) «p)!) (37) 

[01 42] A problem of this function is that the arguments of the factorials have to be positive integer values (or zero), 
which highly complicates its maximization. One solution may be to substitute the gamma function r{x+ 1) - defined 
for (x-i- 1) G«£ - for the facXoriaA x\ and hence optimize the log-iil(ellhood 



ln£(G,p,r,tio; Ami, Am,) = 5Zl^(r(«o - + 1)).- ^ln(r(Ami + 1))- 
- S ^(^(^ - wii + 1)) + S - Wo + m,_x - p(l - r)(< - 1) + 1)) - 

-53ln(r(p- Ami + 1)) - X)Wr(G - tio + m< -p(l - r)(i - 1) -p+ l)) - 

I I 

-5;;in(r(G ~p(l -r)(» - 1) + 1)) + j;in(r(p+ 1)) + 

I 

+ 53 ln(r(G - p(l - T){i - 1) - p + 1)). (38) 

[0143] From the parameter values6.p,f.^ obtained by maximizing the log-lilcelihood ln£ the estimates are then ob- 
tained as 

G=[gJ, p=IpJ, r=LfpjJ. tib=ltibj, (39) 
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with L xJ denoting the largest Integer value that is smaller than x 

[0144] Real data collected during a project does not necessarily consist of the number of failure occurrences for 
each of the test cases, but It may Include grouped data giving the number of failures experienced during several sub- 
sequently executed test cases, Am^^^^ + ... + Am^^^ = m^^^ - For example, the Tedd/ tool produces such grouped 
data if testing intervals and failures are logged for an entire sub-chapter of the test specification, because then the 
failures cannot be counted for any one specific test cases belonging to this sub-chapter. Given the number of faults 
detected before the 1)*test case, the distribution of AM^^^ + ... + AW*, is 



^ ^ ^ 11 /0-p(l-r)(fc-l)\ ' ■ 

^"»<«i+i) ^ P ^ 

(G-jKl-r)(<o+i)~l)j » (40) 

Which reduces to the hypergeometric probability mass function (36) if 1^^^ = M . Therefore, the log-likelihood takes 
the general form 

ln£(G,p,r,t^o;^^.,.„,mi,) = b|JJP(AM^ . 

[0145] Note that it is not necessary to implement equation (40) If the probability mass functton P(Af, = m • G,p,r,iJb, 
s) of the first extended model (23) has already been implemented, since p ^ ^ » * 

= '^^'Wr '"W/ ^-P^^- '>'>'''^'"o - 1). 

However, If the generalized likelihood£ , pennitting all parameters to take non-integer values, is to be used, then a 
version of equation (23) has to be programmed In whfch the binomial coefficients have been replaced by combinatfons 
of gamma f uncttons. 

[0146] For the general first extended partial redundancy model with 0 < s ^ 1 the relationships are more complex. 
This is mainly due to thefactthat, given the number of failure occurrences during the first /test cases and the parameters 
including Uq, the number of detectable faults remaining in the software is not known with certainty. Instead, assuming 
again that the number of faihjres experienced for each individual test case is available, based on the failure history 

^ the conditional probability mass function for the numberof faulty constructs that have already been 

executed at least once (and whose faults therefore cannot be activated anymore), Xy.^ ^AXg.-h AX,2 + ... + AX^ 
can be computed in a recursive manner 

PiXi^i-i = Zi,<.i I Ami, Ainj.i) = ^ P(AJf/,<.i = Aar/.i-i | Amj-i) 

Ax/.*_i 

xP{Xi^i^2 = Xf^i^i - Ax/,<_i I Ami, Ami-.2), (41) 



where 
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with 
» and 



/«0-aEi,i-a) /G-tio+x/.i-a-p(l-r)(i--2)\ 

P(AX,,.. = Ax,,.. I X,,..) = ^^a-^^Jii^^^' 

[0147] The initial probabilities are 

25 

P(AX/.i = AX/.1 1 Ami) = 



30 



for AX/^ = An?!, Am^ + 1, min(p, t^j). 

[01 48] The probability distribution of the nunnber of faulty constructs tested by the Z'' test case only depends on the 
failure history via the number of faulty constructs executed during the previous test cases, i.e., 



P(AX/,i = Axiji I x/,<>i, Ami, Am^-i) = P(AX/.i = Ax/,i | x/,<.i) = 

/tio-x/.*-i\ fC-tio+Xi.i-.|-p(l-r){i-l)\ 
_ V Ax/,< J\ P"A»f.< J 

[0149] Given AXfj, the probability for Am^ failures experienced during the test case In turn does not depend on the 
number of faulty constructs executed before, or the feilure history: 
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P(AAf< = Ami I Ax/4,x/^.i, Ami, Am<_i) = P(AM< = Amj | Ax/.i) = 

= (^') ^^"•'(l - s)^'^-^-^ (43) 

[0150] Using equations (41) - (43), the log-likelihood function for the first extended partial redundancy model with 0 
< s ^ 1 can then be calculated as 
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ln£(G,p, r, wo, s; Ami. ...Ami) = ^ P(^Mi = Arm \ Ami* Am<-i) = 

min(p(t-l),tio) p 



xP(X/.i_i = a:,.i_i I Ami, . ., Amj_i)J . 



(44) 



Substituting gamma functions for the factorials contained in the binomial functions, the generalized version of this 
likelihood,!, was implemented in C. Calculating the log-likelihood value for project data consisting of 1^7 test cases 
and a for a specified parameter constellation turned out to take about six minutes. Therefore, applying an optimization 
routine of S-Plus or R to this function does not seem to be feasible If no way for speeding up the compul::tlons Involved 
can be found. Of course, taking Into account grouped fiailure data will further complicate the matter. 
[01511 Likewise, the complexity of the log-likelihood of the second extended model with 0 < s s 1 can be expected 
to be at least as high as the one of equation (44). 

20 2.5^ Comparison of the estimation methods and further Insights 

[Ql 52] According to the descriptions of the different estimatton methods the following comparisons ccn be made: 

1 . Least squares estimation using the failure occun-ences per test case (LS-Delta) should be preferred over fitting 
^ the cumulative number of failure occumences via the least squares method (LS-Cum). 

2. LS-Cum and maximum likelihood estimation based on the interpretation of the partial redunda; . jy models as 
non-homogenous poisson processes (ML-NHPP) both do not allow to distinguish between the first e:id the second 
extended partial redundancy models. LS-Delta and maximum likelihood estimation refening to the likelihood Im- 
plied by the model setup (ML-Selup) would probably lead to different functional forms for the two models; however, 
due to the arising complexity it is uncertain whether they are feasible. 

3. LS-Cum, LS-Delta and ML-NHPP only allow the estimation of compound parameters whose interpretation de- 
pends on the model variation assumed. With ML-Setup an estimate for each model parameter can be determined, 
but In order to retain a practk:abte likelihood function It may be necessary to examine special cases, e.g. with ari 
activatton probability s equal to 1 00 per cent 

4. The two least squares methods only make use of the expected number of failure occurrences, not of the distri- 
bution around these expected values. The disturbances are definitely not normally distributed, and an altematlve 

40 to choose is not deariy visible. Therefore, the distribution of the parameters e^imators is unknown. While ML-NH PP 

heurisllcallyassurnes the undertying stochastic process to beanon-honrwgenous Poisson process, which de 
does not foltow from the original propositions of the model, ML^etup consistently stcks to the properties resulting 
from the model setup. 



5. The conditional distributions forming the log-likelihood In the ML-Setup approach are non-regular in the sense 
that some of the paiameters to be estimated detemriine the range of the observed variables. For example, in the 
log-likelihood (37) the possible values of Am^ depend on both p and Uq, 

[01531 Whether this property of the log-likelihood as well as the approach of maximizing a generalized log-likelihood 
so int. In whfch all parameters may be real-valued, cause any problems remains to be seen. 

[0154] Despite the drawbacks of ML-Setup, this approach to parameter estimation seems to be the most stringent 
one, largely staying within the f rameworic laid by the assumptions of the partial redundancy models. 
[01551 However, comparing the fit of the (first) extended partial redundancy model estimated according to all four 
estimation methods to the data sets of the baseline projects (figures 12 to 26 in appendix A), a properly of the model 
55 estimates not specifte to the ML-Selup approach atone becomes visible: The estimated number of total (detectable) 
faults In the software tends to be very close to the number of failures experienced until the end of the data set While 
this may be acceptable in a post mortem analysis after the entire test plan has been carried out (and even then It can 
be suspected to be extremely optimistfc), it is definitely an objectionable result to get at a test case coverage level of 



EP1420 344A2 



40 per cent, like for the PPwin project (cf. figure 25). What fs the reason for this model characteristic? 
[0156] Remembering that for the first extended partial redundancy model the mean value function is proportional to 
expected code coverage, It becomes clear that the model is Inclined to "assume" that full coverage has (almost) been 
attained at the end of the data set. Sometimes, it is predicted that several addlttonal test cases are needed in order to 
5 execute the remaining parts of the software (e.g. for project A, figure 1 2). 

[0157] But how much information on the size of the code areas not tested so far is contained in the failure data 
collected? This leads us back to the discussion of testing strategies that fonmed the starting point for deriving the partial 
redundancy model: When operational testing is conducted, then the different functional regions of the software are not 
exercised in strict succession; ideally, the specified test cases for all operations are executed in random order. Thus, 

10 at the beginning of testing high increases in code coverage are gained, and faults are detected in all parts of the 
software. The more testing proceeds, the more the rale of additional coverage attained decreases and the less faults 
remain to be found. Therefore, operational testing continuously gives the testers some feedback about the fraction of 
untested code in the fomi of a decreasing slope in the cumulative number of failure occurrences as a function of testing 
effort or test case coverage. However, the decreasing rate of new failures experienced is often regarded as a sign of 

IS inefficiency. Instead of measuring the current status the testers should rather Increase the software quality at a faster 
pace. Systematic testing strategies try to reduce redundancies In software execution and give guidelines for directing 
the testers' attention to those fum^onal areas and inputs thought to be en-or prone. Obtaining a rather constant value 
for the number of additional faults detected per unit of effort spent, per test case, or the like, is the goal and an indication 
of success of such techniques. But this gain comes at the price of a reduced content of infomnatton in the failure data 

20 collected: the better systematic testing is, the less information about the size of untested code Is included in the data 
set. This is obvious: If ten completely independent te^ cases are run, each uncovering one fault, then one may well 
expect one failure occurrence per additionaJ test case. However, the number of addlttonal test cases to be mn cannot 
be estimated based on the past failure data This value has to be provided in addition to the failure history. 
[0158] aassical software reliability models for operational testing are not in need of this extra piece of information. 

25 Since they imply full replacement of all code constmds, testing can go on infinitely both in tenns of testing effort 
(according to testing effort based models like the Goel-Okumoto model) and in temis of the number of test oases 
executed (for models starting out wUh the number of test cases executed, like the approach by Piwowarski et al.). Such 
models can be fitted to failure data collected during systematk: testing; however, the results cannot be expected to be 
trustworthy. The model by Piwowarski et al. for example, has the same shape as the Rivers-Vouk model with a linear 

30 testing efficiency. For this model, having shown to be the best black box coverage data model according a number of 
criteria [18], the estimates ofthe total nunnberoffautts in the software turned out to be unreasonably high for a number 
of baseline projects. For example, the number of faults remaining in the software tested during project A after 144 
failures had occun^d was estimated to be 1 4467 [5]. The reason for this phenomenon is that the exponentially-shaped 
mean value function of the mod^ fits the ainnost linear data set well if the parameter representing the total number of 

35 faults is chosen large enough [IB, 17]. 

[0159] The partial redundancy model, including setups related to systematic testing, takes into account the fart that 
when trying to avokJ repeated executtons of code constructs testing cannot go on forever in the same manner. As soon 
as the entire software has been covered once a structural break will necessarily occur. If faults can only be detected 
when the construct at .whfch it Is k>cated is exercised for the first time, then the mean value function is necessarily flat 

40 for #>tea-r)j- . Application of the model shows that the failure data collected during systematk: testing does hardly 
contain any infbmfiatton about when this structural break will be experienced. 

[0160] There seem to be at least two possible ways for providing infonnation about the testing yet to be done: 

1 . If not only the number off Milure occunwices but also the structural coverage (e.g., path coverage) gained is 
45 recorded per test case, then the bask: partial redundancy model can be used for deriving the estimates G, p and 
r. In a second step, can be estimaled by maximizing In 



A«o;C»P»r,Ami,..,,Ami) = ln£((?,p,f,tio; Ami,..., Atui) - 

equation (38) - with respect to 

The main obstacle for applying this method is that collecting code coverage infomnation is not feasible for most 
of the projects canied out by the SMEs partidpating at the PETS project. First of all, this is due to the expected 
overtiead of data collection. Furthermore, different tools be necessary for the various programming languages. 
Moreover, the code has to be available in order to instrument it for coverage measurement. However, at a system 
test level it is nomnally not possible to access source or debug code for this end. 

2. The parameters can be estimated from equation (38) under the constraint ^1 - 1)/^^ 1, where it is the total 
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number of test cases in the test plan. TTiis restriction innplies the beiief that the entire software cannot be covered 
before ail test cases have been executed. Of course, it is necessary that the remaining test cases have the same 
average size as the test cases run so far, and the redundancy level is also assumed to be the same. It should be 
noted that the estimate of determined with this method can be expected to be close to the one resulting from 
the approach discussed under 1 only if the coverage that would be attained by the it test cases is indeed close to 
1 00 per cent. In order to diminish the influence of the current test specification, instead of it some higher value / 
- including the number of additional test cases not contained in the test plan, but necessary to cover the whol^ 
software on a black-box level - can be used. However, there still may be effects due to the methodology used for 
deriving test cases. can be regarded as the expected number of faults expected to be detectable under a given 
testing strategy. 

Such a value may even be of higher Interest than the expected number of total faults Inherent in the software 
at the beginning of testing calculated using code coverage infonnation: Often, there are sections of code that 
cannot be reached at all during program execution; the re-use of existing software components has definitely 
increased the extent this phenomenon. Since the faults contained in such "dead" code can not cause any failures 
of the software product, the total number of faults remaining in the software is of limited value If the number of 
unreachable faults included in this full amount is not known. Using the black box approach refening to the different 
operations and input combinattons to the tested may yietel a better estimate of the proportion of the software that 
is executable. 
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[01611 Following the approach in 2 (that could be dubbed ML-SetupC) the parameters were estimated once again 
for all the data sets. From figures 27 to 41 in Appendix A it can be seen that the first partial redundancy model can 
produce a spectrum of mean value ftjndions between straight lines and doubiy-min^ored exponential functions. S- 
shaped curves and functions with increasing slopes are outside its scope. These might be modelled with time-varying 
functions forthe redundancy level r(or, if sis not set equal to one, for the fault activation probability s). However, these 
generalizations would add considerable complexity to the derivation of the mean value ftjnclton. the probability mass 
functton of M/ and the k>g-likelihood function. 

P162] The parameter estimates obtained with ML-SetupC are listed in table 3. While, for example, the estimated 
redundancy level may be used to draw conclusions about the quality of the testing process (assuming ttiat avoiding 
repeated executions of the same portions of code is one of Its goals) the analysis could also be reversely directed: 
Based on infonnation of the testing process one could try to estimate the redundancy level to be attained for a certain 
test project. There are several reasons why such an approach can be useful: 

1 . If it is possible to estimate the parameters of a software reliability growth model even ff no failure data are 
available, then the so-called "eariy predfction" [31] of the software failure pattem can be conducted before testing 
starts. 



40 



2. It is a well-known fact that there is considerable noise in connection with the observed failure occurrences. 
Espedally at the earty stage of testing, when only few failure data have been collected, the "dynamic" parameter 
estimates cateulated using the test data can be unstable or even outside the regk>n of meaningful values. If this 
happens, the static estimates can be sui)stituted for the dynamk; estimates. 

3. The static estimates might be a reasonable chofce for the starting values to be used for numerically optimizing 
the k)g-likeiihood function of the partial redundancy model. 
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Table 3: 



P^irameter estimates obtained with ML-SetupC 




A 

r 


A 

p 


A 

G 




Project A 


0.000 


3872 


6033085 


425 


Project B 


0.000 


433664 


278676447 


465 


Project C, first data set 


0.000 


5349 


248083 


53 


Project C, second data s^ 


0^208 


7837 


364676 


72 


Project D 


0.000 


899 


750420 


88 


Project E, integration test 


0.000 


34391 


6094106 


34 
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Tables: (continued) 



Parameter estimates obtained with MLSetupC 





f 


A 
P 


G 


^0 


Project E, first system test 


0.000 


481 


55896 


47 


Project E, second system test 


0.738 


724 


34399 


40 


Project F 


0.495 


1731 


1206797 


95 


Leo 


0.750 


1547 


75576 


71 


Skartia 


0.339 


875 


2032 


58 


VPro 


0.000 


16999 


605456 


17 


WinDeich 


0.000 


579 


22040 


57 


PPwin 


0.709 


1705 


62024 


53 


Stwin 


1.000 


1772 


95002 


95 
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[0163] The analysis of environmentaJ information and its association with parameter estimates of the partial redun- 
dancy model is described In the remaining part of this study. 

3 Emrironmental information and the partial redundancy nKKlel 

3.1 The PETS questionnaire version 1.0 

[0164] Since Its draft version, published in part in the Justified Model Selection [18], the PETS questionnaire has 
undergone major revisions. The cunrent version 1 .0 can be found In appendix B of this document. Its main section still 
consists of questions concerning the capability of software development and testing processes oriented on the refer- 
ence model in the technical report version of the standard ISO/I EC 1 5504 [23] . According to the objective of i nvestigating 
the influence between software process maturity and parameters of software failure models, emphasis was put on 
those processes expected to be dosely related to sofhi\rare quality, for example the ones belonging to the Engineering 
processes category. Furthemnore. questions conceming system test, the phase in which the failure data analyzed are 
collected, have been grouped in a separate section. 

[0165] In order to further amplify the alignment with this SPICE model, the possible scores for each process have 
been changed to a range between zero and five, matching the SPICE capability levels. For the scores 0, 1 , 3 and 5 
detailed scenarios have been developed. They try to give the respondent an idea of what the respective capability 
level specifically means. The basic Ideas of how processes evolve between the different levels have been talcen from 
the process attributes described in the capability dimension of the reference model [23]: 

• At level 0. the process is not Implemented. If actions are taken or output is produced, then this is done ad-hoc and 
in an unsystematic way. 

• At level 1 . the process is performed, which shows in the existence of various process-related woric products. 

• At level 3, defined processes are tailored for the individual projects. The required resources are estimated and 
allocated to the processes. 

• At level 5, process measurements are tal(en and used for controlling the process performance. The effectiveness 
of process changes, triggered by a continuous Improvement program, can therefore be assessed. 

Basically, the scenarios dispose of the separation between the processes and the generic attributes linked to process 
capability. Bridging this gap and thus transfem'ng the theoretic model to an environment professionals can more easily 
relate to is probably the most difficult task to be done during a process capability assessment. 
P1661 Of course, the fomiulation of scenarios In the questionnaire bears the risk of producing descriptions that are 
too specific for general use and only fit for certain types of companies. Moreover, for the sake of brevity, not all char- 
acteristics mentioned in the respective process and capability definitions can be included In a scenario. * 
[0167] Whne care has been gh^ to omit references to particular techniques, it is certainly true that the scenarios 
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should only be used as a guidance for filling in the questionnaire, not as a dogma. Moreover, It must be noted that the 
objective of the questionnaire is not to detemiine SPICE levels In a way that can compete with standard assessments, 
but to get an Impression about the general software development and testing quality and analyze Its influence on the 
parameters of a sofhvare failure model. 

[0168] However, it can well be expected that the usage of scenarios decreases subjectivity on the part of the re- 
spondents in rating the processes. Furthermore, ennpirical evidence described in the next section Indicates that despite 
the possible problems documented above the questionnaire might be employed beneficially in a standard assessment 
as one means In a mix of instruments. These results also back up the suggestion that companies that have already 
undergone a standard SPICE assessment can fill in the levels attained for the respective (component) processes 
instead of answering all questions. 

[0169] Some of the processes included may not be relevant for certain software developing companies. If, for ex- 
ample, no software programmed by third parties is integrated in the product, then a software acquisition process does 
not have to exist. Therefore, individual questions or entire (component) processes may be mariced as "not appllcable" 
in the questionnaire. 

[0170] In three additional sections, information about the specific development project excluding the system test 
phase, the test project and the software product is assessed, because it is obvious that in addition to the general 
process maturity of the sofhvare development and test companies Invoh^ed such specific characteristics Influence the 
sofhvare quality and the software failure pattem encountered during testing. 

[0171] In formulating these sections results of previous research was utilized. In 1985. Takahashl and Kamayachi 
[40] analyzed the influence of ten factors on the fault density of thirty software products at the beginning of the unit 
testing phase. Zhang and Pham [35. 44] adapted and augmented this list to a set of thirty-two environmental factors 
and used a questionnaire to assess their p/esaiTietf impact on software reliability after release according to software 
managers, programmers, system engineers and testere. A selection of these factora for which data collection seemed 
possible was included in the PETS questionnaire. Where necessary, the factors were further operationalized, formu- 
lating questions whose answers are defined categories or numerical values. Discussions between the partnere of the 
PETS project resulted in further suggestions of what aspects to take into account in the questionnaire. 

3.2 The PETS questionnaire and SPICE levels 

[0172] Besides the possibility to draw on know-how Incorporated In an existing model, the main reason for structuring 
the questionnaire according to the SPICE reference model was to enable companies that have already been formally 
assessed to make use of the assessment results instead of filling in answers to the questions. 
[0173] Therefore, all answers betonging to one (component) process should be condensed to one capability rating 
that can be replaced by its respective assessment outcome, if available. Since the scores are measured on an ordinal 
scale, the median is the appropriate measure of central tendency which can be used as the "typteal" representation of 
the answererelatedto the process. The median of aset of /^observations an^nged in increasing order, is often defined 
to be the {—)^ observation when N is odd and the average of the (j)^ and the {!^)»» observation when N is even 
[34]. The osculation described for the case of an even W is a bit problematic, sinc» it entails taking the average of 
comparative values, for which the cfistances do not have any meaningful Interpretation. Therefore, it seems reasonable 
to use the largest Integer value (the largest score) being smaller than or equal to the average of the (-)«" and the (!tl)ff> 
observation Instead. ^ 2 

[0174] The procedure of substituting SPICE levels for the median scores of the respective process ratings implies 
the belief ttiat in general ttiese medians are not be too far off ttie actual assessment outcomes. This hypottiesis could 
only be tested based on pairs of assessment and questionnaire results for a number of companies. 
[0175] Wittiin the PETS project, such data were available for one company, Procedlmlentos-Uno. In June 1 998 and 
March 1999, the European Software Institute had earned out two Infomial SPICE assessments of various software 
development processes. Having In mind the former situation at Procedimientos-Uno, one of their employees filled in 
the PETS questionnaire. For fourteen (component) processes (including one combination of two processes) ttie as- 
sessment results and questionnaire medians are Jointiy available; these processes, are listed In table 4. 



Table 4: 



Processes assessed by the ESI and evaluated with the questionnaire 


Identifier 


Process name 


ENG.1.2 
ENG.1.3 
ENG.1.4 


Sofhware requirements analysis process 
Software design process 
Software construction process 
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Table 4: (contmued) 



Processes A 


ssessed by the ESI and evaluated with the questionnaire 


Identifier 


Process name 


ENG.1.5/1.6 


Software intearation orocess / Software testinn nrnrn>^ 


MAN .2 


Project manaaement Drncess 


I^AN.3 


Quality management process 


SUP.1 


Documentation process 


SUP.2 


Configuration management process 


SUR3 


Quality assurance process 


SUP.8 


Problem resolution process 


CUS.3 


Requirements ellcltation process 


ORG^.1 


Process establishment process 


ORG.3 


Human resource management process 


ORG.4 


Infrastmcture process 



[0176] For ten of the fourteen processes the median values obtained were In accordance with the assessment out- 
comes. The assessed SPICE level was overestimated by one level In two cases and underestimated by one level in 
two other cases. Since the overestimations concerned the same SPICE level, which is also true for underestimatlons. 
the results can graphically be displayed in the form of figure 9. (Remember, that for ordinal data the distances behweeri 
the values is not defined. Therefore, it is not possible to regard a deviation of one from level 2 to be as large as a 
deviation of one from level 4.) In order to protect confidentlai information; the SPICE capability levels themselves are 
not revealed. 



10 
8 

processes 4 

2 
0 

-10 1 
Deviation (nodian score - assessed SPICE ievei) 
Figure 9: Results of comparing the median values to the outcomes of the SPICE assessment 

[0177] The findings seem to confinm that the scenarios developed follow the general line of process improvement 
described by the reference model of ISO/IEC 15504 TR. If these encouraging results should be corroborated by future 
research, then the PETS que^onnaire might rightfully be employed to get a first impression of the process maturity 
of a company or even be used as one instrument in preparation of interviews for a standard SPICE assessment 
[0178] Of course, even if the questionnaire should come to the same conclusions as a SPICE assessment, this does 
in no way validate the SPICE model itself, i.e., prove that this model does In fact measure the quality of the processes 
of an organization. Studies about this issue are being earned out in the course of the experiment phase of the SPICE 
project (cf., for example, [6]). 

3^ Analyzing the influence of process capability and project characteristics 

[0179] This section tries to investigate whether the model parameter estimates obtained using the ML-SeptupC ap- 
proach (cf. section 2.5.3) are in any way related to the answers given in the questionnaire. Questionnaires more or 
less completely filled in were available for ten projects: A, B, E, F, Leo, Skartia, VPro, WinDeich, PPwin and SSwin. 
[01 80] The model paranneters have simple interpretations which help in grouping the factors which possibly influence 
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1 . Uq represents the total number of faults In the software at the beginning of testing. One may assume that this 
Is determined by the general quality of the software development process, by the circumstances of the specific 
development project (e.g., the skill of the developers) and by characteristics of the software (for example, its size). 

2. The redundancy level r could be related to the maturity of the software testing process and the specific testing 
project. Moreover, it may be influenced by characteristics of the software. The same might be true for the fraction 

^, the proportion of the software executed per test case. However, due to the fact that |(1 - often tums out 

to be dose to one (see section 2.5.3), it may be possible to estimate J based on r and the number of test cases 
necessary to cover the software on a black box level. 

[0181] Of course, the Interpretations of the model parameters given above only hold if the software under test does 
not undergo any significant changes during the time in whteh the failure data are collected. For example, if existing 
features are re-Implemented and re-tested in the consecutive system test cycles, then the total number of faults fto 
be) found is not equal to the total number of faults Inherent in the software at the beginning of testing. 
[01 82] On the one hand, in order to obtain reliable results, the analysis has to be based on comparable projects. On 
the other hand, the already small sample size does not allow to be hugely decreased by omitting several projects 
Therefore, the following decisions have been made: 

1 . The majority of the baseline projects (B, Leo. Skartia, VPro. Slwin, PPwin) consisted of the first system test 
cycle of a new software product. These projects should definitely be examined together, and if possible other 
projects shouM be included in the analysis. 

2. The data for project A were collected during four consecutive system test cycles. However, in between the 
different cycles additional features were implemented, and these new features were then tested with a low level 
of retests of the other features. Therefore, the sftuation is close to systematte testing of a new software product 
within one test cyde. This is confirmed by the fact that for the number of failure occurrences as a f unctfon of testing 
effort no structural breaks are visible between the test cydes. (The structural break in the failure time data seems 
to be connected to the change In the testing personnel after the first test cycle and the resulting decrease in the 
speed of test case execution.) For that reason, the entire data set is treated as belonging to one system test cycle 
of a new software product. 

3. For project E, failure data for both two Integration and system test cydes are available. The data set of the 
earlier ^stem test cycle has been Induded in the analysis. 

4. The application tested in the WinDefch project was not completely new. However, it had been ported to a new 
database engine, which had entailed the re-engineering of large parts of the existing application Therefore it 
seems reasonable to analyze this project tog^er with the other ones. 

5. Project F started in mid 1999. The data examined in the Baseline Experiments report [5] were collected during 
more than ten system test cydes that took place ft-om November 2001 to April 2002. That they are related to a 
generally stable software that has already been existing for some times shows in the low number of failures ex- 
perienced per test cyde. Furthermore, there is a general trend of increased software quality from rele^ 
resulting in a decreasing slope of the cumulative number of failure occun-ences. While the partial redundancy 
model seems to be able to fit the failure data (cf. figure 35), comparing the estimated model parameters to the 
ones of the other projects, does not seem to be reasonable. It has therefore been decided to drop project E from 
the investigation of the influence of environmental factors on the model parameters. However, for the examination 
of assodatkms between various environmental factors the questionnaire of project E can been exploited. 

3.3.1 Influence on the estimated number of inherent faults 

[01 83] As mentioned above, one may expect that the total number of faults in an application depends on the capability 
of the software development processes as well as environmental factors connected to the specific project and product 
Moreover, the size of a software program has often been assumed to be a driving factor for the number of defects in 
it. Therefore, many existing models - like the one by Takahashi and Kamayachi [40], the one developed at the Air 
Force's Rome Laboratory (cf . [7] and [29. ch. 7]) and Malaiya's and Denton's model [32] - try to explain the fault density. 
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the numberoffaultsdivlded by the software size. Typically, the measure of size usedls the number of (non-commentary) 
kilo lines of source code. For several baseline experiments of the PETS project this figure was not available. A quantity 
that could be provided Is the size of the compiled co6e. Using this piece of infomiatlon, the fault density in temis of 
faults per MByte of compiled code was detemilned. To what extent this measure depends on various environmental 
factors will be investigated with univariate descriptive analyses In the following sections. Based on the findings multi- 
variate models are then proposed. 

3.3. 1. 1 Univariate analysis of software devefopment process maturity 

[01841 For the 24 processes included in the questionnaire, the scores were obtained by computing the median value 
of the answers given to the related questions. Some processes were not applicable to all ten projects for which ques- 
tionnaires were filled in; for example, some projects did not involve software suppliers and therefore did not require 
any acquisition processes. In addition, sometimes processes were not rated for other reasons. In the list of processes 
in table 5 the number of projects for which a process was not applicable or not evaluated is included. 
[Oias] One may expect that a company that shows a good perfomiance In one process also attains high scores in 
other processes. This hypothesis can be checked by cateulating a measure of association for all paire of processes. 
A measure designed for ordinal-level variables is Goodman's and Kruskal's y, which is based on the number of con- 
cordant and discordant pairs of obsewations. Detemnining yfor two processes A and B, a pair of observations (in our 
case, two questionnaires) Is concordant if In one of the questionnaires both processes get a higher score than in the 
other questionnaire. If in one questionnaire one process is rated higher than in the other questionnaire, while the other 
process Is rated tower, then the pair of observations is said to be discordant Observations for which one (or both) of 
the processes attain the same score in both questionnaires are called "tied". 



Table 5: 



Number of basdkie expef^neiOs (out of lOyfdrwfiicti tfte processes ¥¥ere not appO- cable or tor wtiich ratings were 




notm/ailabie 




Identifier 


Process name 


#of not apply # of NA 


ENG.1.1 


System requirements analysis and 


0/0 




design process 




ENG.1.2 


Software requirements analysis process 


1/0 


ENG.1.3 


SoflA'are design proccsc 


0/0 


ENG.1.4 


Software construction process 


0/0 


ENG.1.5 


Software integration process 


0/0 


ENG.1.7 


System integratton and testing process 


0/0 


MAN.2 


Project management process 


0/0 


MAN.3 


Quality management process 


0/0 


MAN.4 


Risk management process 


0/0 


SUP.1 


Documentation process 


0/0 


SUP.2 


Configuration management process 


0/0 


SUR3 


Quality assurance process 


0/0 


SUP.4/5 


Veriftoation process / Vklklation process 


0/0 


SUP.6 


Joint review process 


5/0 


SUP.8 


Problem resolutton process 


0/0 


CUS.1 


Acquisition process 


3/2 


CUS.2 


Supply process 


3/0 


CUS.3 


Requirements elk:itation process 


1/0 


ORG.2.1 


Process establishment process 


0/0 


ORG.2.3 


Process Improvement process 


0/0 


ORG.3 


Human resource management process 


0/0 


ORG.4 


infrastmcture process 


0/0 


ORG.5 


Measurement process 


0/0 


ORG.6 


Reuse process 


1 /o 





43 



EP 1420 344 A2 



[01 86] With Nc and denoting the number of concordant and discordant pairs, respectively, y Is then calculated 
follows: 



[01 87] When there are only concordant (and tied) pairs, but no discordant ones, then y Is equal to one. This indicates 
the maximum positive association between the two process ratings, which means that higher scores for one of the 
processes tend to occur jointly with higher scores for the other one. Similarly, y attains its minimum value, minus one, 
If only discordant and tied pairs are present in the data. A sufficient (but not necessary) condition for y being equal to 
zero is statistical independence between the two variables. 

[0188] The values of y computed for ail pairs of process ratings are listed in the two parts of table 6. 
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[0ia9I Indeed, almost all of the process ratings are positively associated, nnany of them to a high degree. Merelythe 
scores of the software requirements analysis process ENG.1.2 and the risk management process MAN.4 show a 
(weak) negative associatwn to some of the other process ratings. 

[0190] Due to the high redundancy in the data implied by these findings, it seems possible to further condense the 
process scores without losing much information. Furthermore, the questionnaire may probably be shortened by omitting 
questions on several of the processes. 

PM911 An obvious way of compressing the process scores is to cateulate the median rating for each process area. 
Then the influence of these different categories on the fault density can be analyzed. While the scores are ordinal-level 
variables the fault density is measured on a quantitative scale. For this constellation, no generally accepted measure 
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of association secnns to be available. Either, the fault densfly would have to be grouped for the calculation of Goodman's 
and Kruskars Y, or the capability ratings would have to be interpreted as qualitative variables in an analysis of variance. 
In either case, Information would be lost. Moreover, the latter type of analysis has two additional drawbacks: 

• It entails the necessity to calculate the mean effect of each capability level on fault density. Thus, up to six (for 
levels 0 to 5) effects would have to be detemnlned from nine data points; therefore, several computations would 
have to be based on only one or two observations. The small amount of projects does hardly allow this kind of 
analysis. 

• Since the analysis of variance is based on one qualitative variable, it is not appropriate to check whether the 
relationship between the two variables is monotonous. 



[0192] Although It is problematic from a methodological point of view, in market research and other industrial appli- 
cations of statistics, ordinal-scale variables with seven or even five categories are often treated as quantitathm variables. 
Several of the fault density models mentioned above phe Rome Lab model and the one by Malaiya and Denton) do 
the sanrre by replacing the levels of ordinal variables by quantltathre factors that are used as the model input. 
[01 931 Therefore, in the following paragraphs the scores of individual processes and the median scores of sets of 
processes are considered to be (discrete) quantitative. 
[01 94] Moreover, project F is being dropped due to the reasons stated above. 

[01951 A measure of linear association between two quantitative variables Xand Y \s the correlation coefffcient p It 
Iscalculatedbydhddlngthecovariances^bythesquarerootsof the variances of Xand Y, and s^. If all observations 
(x^ are located on a line with positive slope, then p is equal to 1 . Similarty, if the relationship is^perfectly described 
by a line with negative slope, then calculation of p results in -1 . Statlstteal independence of X and Y Is sufftelent, but 
not necessary for p to be 0. This means that the two variables can be dependent even if a p of 0 is obtained* however 
this relationship is definitely not linear. 

[01961 In table 7 the conrelatrans between the process area scores and the fault density (FDEN) are listed. Most of 
them are negative (the better the processes, the smaller the fault density), only for the management process category 
a slight positive influence is indfcated. In temis of absolute values, all of the effects are relatively small. 



Table 7: 



Correiations between process category scores and (the logarithm of) fault density 




FDEN In (FDEN) 


ENG 


-0.101 


-0.322 


MAN 


0.081 


-0.261 


SUP 


-0.221 


-0.446 


CUS 


-0.326 


-0.572 


ORG 


-0.362 


-0.663 



[01971 Using ttie logarithm of fault density instead of fault density itself Intensifies the degree of linear relationship 
with the process category ratings. This suggests that the fault density increases more than lineariy when a process 
category score is decreased. Moreover, all correlattons are now negative. Nevertheless, the dependencies are still 
moderate. 

[01981 Interestingly, the score showing the highest correlations with fault density and its logarithm is not ttie one of 
the engineering process category (consisting of those process most closely related to software development), but the 
one of the organizatfon process category (concerned with establishing and supporting the business goals of an organ- 
ization). However, the scatter plot in figure 10 reveals that the rating for the ORG process category only takes two 
different values. Therefore, merely one project Is responsible for the correlations observed. 



47 



EP1420 344 A2 



10 



15 



20 



25 




CspMy mltao of ORG process eategoiy 

Figure 10: Scatter plot of capability rating for ORG process category and the logarithm of 

fault density 



[0199] Instead of condensing the ratings of one process category, one might use selected processes that seem to 
contribute most to the explanation of the fault density of a software prx)duct. The correlations between all process 
30 scores and the (logarithm of the) fault density are listed in table 8. 

Tabled: 



40 



45 



50 



55 



Correlations between a// 


process scores and (me logarimm of) fault density 




FDEN 


In (FDEN) 


ENG.1.1 


-0.101 


-0.322 


ENG.12 


-0.393 


-0.454 


ENG.1.3 


-0.328 


-0.532 


EN6.1.4 


-0.305 


-0.459 


ENG.1.5 


0.469 


0.275 


ENG.1.7 


-0.007 


-0.327 


ENG.2 


-0.101 


-0.322 


MAN2 


0.280 


-0.080 


MAN.3 


-0.151 


-0.441 


MAN.4 


0.478 


0.429 


SUP.1 


0.134 


-0.191 


S\JPj2 


0.205 


-0.110 


SUP.3 


-0.235 


-0.556 


SUP.4 j 


-0.526 


-0.636 


SUP.6 1 


-0.095 


-0.577 
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Tables: (continued) 



Correlations between all process scores and (the logarithm of) fault density 




FDEN 


In (FDEN) 


SUP.8 


-0.370 


-0.624 


CUS.1 


-0.662 


-0.914 


CUS.2 


-0.747 


-0.772 


CUS.3 


0.144 


-0.238 


ORG.2 


-0.114 


-0.423 


0RG.3 


-0.414 


-0.601 


ORG.4 


-0.362 


-0.663 


0RG.5 


-0.078 


-0.416 


ORG.6 


0.360 


0.065 



Again, almost all process ratings are negatively conflated with the logarithm of the fault density, while a numk)er of 
them is posftively con^elated with fault density. Furthemnore, taking the logarithm of fault density significantly Increases 
the strength of the linear relationship in many cases. 

[0200] While the conelation coefficients for the processes SUP.6, CUS.1 and CUS.2 are all high in at)solute temis, 
from table 5 it can be seen that they are only based on five or seven observations. The smaller amount of data points 
per process maturity rating may be responsible for the reduced fluctuation around the assumed linear relationship. 
Since these processes are often not applicable or difficult to evaluate, predictions should probably not be based on 
them. These processes are therefore left aside for the time being. If further projects with ratings for them should be 
available in the future, they might be included in the analysis again. 

[02011 The other seven strongest correlations for the logarithm of fault density are highlighted in table 8. They are 
all negathre, ranging from -0.459 to -0.663, and they are related to the (component) processes ENG.1 .3, ENG 1 4 
SUP.3, SUP.4, SUP.8, 0RG.3 and ORG.4. Table 6 indicates that the scores of these processes are strongly associated! 
It therefore makes sense to combine them to one rating by detemfiining their median value. 
[0202] As shown in table 9 in absolute tenns the correlations of the resulting selective maturity rating (SMAT) with 
both fault density and the logarithm of fault density are higher than for any of single processes that contributed to it. 
While condensing the process ratings hardly diminishes the content of Infonnatbn it helps to balance outliers. Further- 
more, the nnear relationship between the new rating and (the logarithm oO fault density is stronger than for any of the 
process category marics analyzed in table 7, which could be expected. 



Table 9: 



Correlations between SMAT and fihe logarithm of) fault density 




1 FDEN 


In (FDEN) 


SMAT 


-0.533 


-0.690 



45 [0203] The scatter ptot of the selective maturity rating and the the logarithm of fault density is depfcted in figure 1 1 . 
Obviously, SMAT is more successful in discriminating between the projectsthan the score obtained for the ORG process 
category, which only attained two different values (cf. figure 10). 



49 



EP1420344 A2 





s- 






density 


o 


1 

1 


rt - 

o 

ri ~ 

s- 

o 

r* " 



0-0 0.5 1.0 13 ZO 2A 3.0 

SbIb^vv imlwfty rating 

Figure 11: Scatter phi ofsdecUve maiuray rating SMAT and the logariUm of fault density 
3.3. 1.2 Univailate analysis f^iOherentnionmentaifectOfs 

[0204] As already noted In secdon 3.1 , based on work by Takahashi and Kannayachi [40] and Zhang and Pham [44] 
environmental factors related to the specific development project and software product were Identified and operation- 
aiized further. The quantitative variables derived from the answers to the questionnaire are listed and explained In table 
10. In the questionnaire, some questions concerning qualitative variables (e.g., the target topology of the software) 
are contained as well. However, as mentioned above, In order to determine a descriptive measure of the Influence of 
a qualitative variable the mean effect per category has to be calculated. The cun-ent number of projects seems to be 
too small for conducting this type of analysis. It may be reasonable to take the the qualitative factors into account later 
on, when data on additional projects will be available. 



Table 10: 





Environmental factors related to software development and the software product 


Identifier 


Explanation 


#ofNA 


DEFF 


Actual effort of development ^chiding system test in person months 


1 


DEPl 


Development ^ort perfonmance index: Ratio of actual development effort to planned 
development effort 


2 


DRUN 


Runtime of devetopment exdudirtg system test (in months) 


1 


DRPl 


Development runtime perfonnance index: Ratio of actual development runtime to planned 
development mntime 


1 


DDIF 


Development difTiculty: 

DDIF = -^ 
DRUir 


1 


OMSK 


Development team managers' skill: (Average) Number of years of experience in managing 
software projects 


1 


PGSK 


Programmers' general skill: Average number of years of professional programming 
experience 


2 
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TabtelO: (continued) 



Environmental factors related to software development and the software product 


Identifier 


Explanation 


#of NA 


PSSK 


Programmers' specific skill: Average number of similar projects (w.r.t. programming 
language, software architecture, etc.) that the programmers have worked on before 


1 


DTSI 


Size of the development team: Maximum number of team members 


1 


PNDT 


Percentage of new members in the development team 


2 


PRCH 


Percentage of requirements changed after the specification phase 


1 


SICC 


Size of the compiled code in MByte 


0 


PRCO 


Percentage of code reused from other apptk^tions, or libraries 


1 



[0205] The correlattons between the quantitative environmental factors are shown in table 11 . 
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[0206] The highest pairwise associations seem to exist In the set consisting of development effort, development 
runtime, size of the development team and the size of compiled code, which is reasonable. 
[0207] In table 1 2, the correlations between all quantitative factors related to development and the product and (the 
logarithm of) fault density are listed. According to these results, the percentage of requirements changed after the 
requirements phase and the the development mntlme perfomnance index are the factors most strongly correlated with 
fault density. As for the logarithm of feult density, the size of compiled code and the the development effort perf onnance 
index achieve the highest conrelatlons. 
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Table 12: 



Correlations between environmental factors and (the logarithm of) fault density 




1 FDEN 


in (rUclM; 


DEFF 


i -0 




DEPI 


1 0.583 


U.3f O 


DRUN 


-0.075 


n 4AA 

U.1U0 


DRPI 




Q.OoO 


PDIF 


-0.560 


_A OCT 


OMSK 


0.466 


0.405 


PGSK 


0.650 


0.458 


PSSK 


0.229 


0.220 


DTSI 


-0.256 


-0.066 


PNDT 


0.060 


0.192 


PRCH 


0.752 


0.480 


SICC 


-0.434 


-0.668 


PRCO 


-0.044 


0.208 



[0208] From table 11 . It can be seen that both the correlation between the percentage of requirements changes and 
the development runtime perfonmance index (0.369) as well as the one between the size of compiled code an the 
development effort perfomriance index (0.331) are relatively weak. Therefore, it may be reasonable to include the 
factors PRCH and DRPI in a model for the fault density and the factors SICC and DEPI in a model for the logarithm 
of fault density. Whether the selected maturity rating SMAT discussed above should be incorporated as well (or Instead 
of one of the other factors), and which of the multivariate models to select, will be discussed In the next section. 

3.3.1.3 MulUyarlate analysis 

[0209] So far, the correlations between one environmental factor and (the logarithm of) fault density have been stud- 
ied. In a linear regression model, several fartors can be Included as exogenous variables trying to explain the endog- 
enous variable. For example, when regressing fault density on the development runtime performance index, the per- 
centage of requirements changed and the selectfve maturity rating the model has the form 



FDEN, = Oq + .DRPI; PRCH, + 03. SMAT, + e, (45) 
with the residuals e, representing the disturt>ances of the linear relationship. 

[0210] Thefitof such a model is the better the smaller the en-orsum of squares (SSE) is, which is the sum of squared 
deviations of the obsen^ fault densities from the respective values on the regression hyper-plane. In the best case, 
all data points are located on the hyper-plane, and the SSE is zero. On the contrary, if the exogenous variables do not 
make any contribution to the explanation of the endogenous variable then the SSE is equal to the total sum of squares 
(SST) of the endogenous variable, the sum of squared deviations of the observed values from their mean. Therefore 
the coefficient of d^emiination, defined as 



is bounded by zero and one and is the larger the better the model fits the data. 

[0211] Including additional explanatory variables in a model cannot decrease the coefffcient of detennination. How- 
ever, ttie extra gain may be too small to justify the higher complexity of ttie model. In order to account for tills the 
adjusted coefTicient of detenninatton ' 
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10 



fl^=l n-1 
SSTn-/t-1 

has been suggested [34]. It penalizes for the amount of exogenous variables k, relating it to the number of available 
observations n. 

[02121 The two factors most highly con^elated with fault density are the development runtime perfomiance index and 
the percentage of requirements changes. All subsets of these factors and the selective maturity level have been used 
to model fault density. The resulting (adjusted) coeff iciehts of detemiination are listed in table 1 3. The number of projects 
(out of the nine projects studied) for which the data were not available are shown as well. Each model Is abbreviated 
by the explanatory variables concatenated with V, omitting the fact that a slope is always included. For example the 
model specifically indicated as (III) is the model of equation (45). 



Table 13: 



15 



20 



25 



Fits cf models for tault density 




1 ^ 




#ofNA 


DRPI 


1 0.474 


0.387 


1 


PBCH 


0^6 


0.493 


1 


SMAT 


0.284 


0.181 


0 


(1) DRPI+PRCH 


0.994 


0.991 


2 


DRPI+SMAT 


0.488 


0.283 


1 


(II) PRCH4SMAT { 


0.956 


0.939 


1 


(III) DRPI+PRCH+SMAT | 


0.996 


0.991 


2 



[02131 The three models with the largest (adjusted) coefficients of determination are marked with (I) to (III). In table 
14, the "rear fault densities (based on the estimated values of u^) of the projects and the estimates according to these 
30 three linear models are shown. 



Table 14: 



40 







'Real', 


and estimated teuttdensit^s for selected models 




A 


IB 


IE 


ILeo 


ISkartia 


IVPro 


1 WinDeich 


1 PPwin 


1 Siwin 


"Real" 

FDEN 


170.701 


39.015 


5.098 


44.375 


47.120 


12.697 


22.152 


106.835 


184.242 


Est. (1) 


172.889 


NA 


NA 


48.483 


38.565 


17.232 


25.232 


99.942 


185.780 


Est. (II) 


141.877 


NA 


4.368 


37.706 


37.706 


23.540 


28.852 


124.169 


195.002 


Est. (Ill) 


170.905 


NA 


NA 


46.178 


37.409 


17.679 


25.078 


105.320 


185.554 



[02141 Model (II) provides the least preferable fit off the three models, but it can be recommended for projects for 
^ which the development runtime perfonnance index is not available. If this piece of infonnatlon is known, then either 
model variant (I) or (III) should be used. For the latter one the coefficient of detemiination is slightly larger while the 
adjusted coefficient is the same. If the selective maturity rating has been detemiined, Including It In the model may be 
worthwhile. 

[021 5] However, a larger amount of exogenous variables may give rise to the problem of multicollinearity [11]: Strong 
50 dependencies between the explanatory variables result in unbiased but inefficient estimators. In extreme cases, 

the estimates cannot be calculated at all. Therefore, it should be checked whether multkx>ilinearity is an issue in model 
(III). One measure of multkx>llinearity is based on the correlation matrix of all exogenous variables (including the con- 
stant related to the intercept a^. With and denoting the smallest and largest eigenvalue of this matrix, re- 
spectively, the so-called condition number C Is calculated as 

55 
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Values of C larger than 20 indicate potential problems due to multicollinearfty [11]. 

[02161 The condition number determined for model (III) is 32.915. Consequently, model (I) - for which the condition 
number is 7.952 - should be chosen In favor of model (III). 

[0217] As for the association with the logarithm of fault density, the development effort performance Index and the 
size of compiled code showed the highest correlation. All regression models for In(FDEN) with subsets of these two 
factors and the selective maturity rating as exogenous variables are examined in table 15. The highest adjusted coef- 
ficients of determination are achieved for the linear models combining this maturity score with one of the other factors 
Indicated by (IV) and (V). While the "full" model including all three variables has the same coefficient of determination 
as variant (IV), It Is less favorable due to the larger number of regressors employed. 

[0218] Comparing the level of the (adjusted) coefficients of detennination to those obtained for the linear models of 
fault density Itself, It Is dear that the models for Its logarithm are less successful in explaining the variation in the values 
of the exogenous variable. 

Table 15: 



FOsotmod^ forme iogarilhm of fault density 







fl2 


#of NA 


DEPI 


0.334 


0.201 


2 


SICC 


0.447 


0.368 


0 


SMAT 


0.476 


0.401 


0 


DEPI-^SICC 


0.586 


0.379 


2 


(IV) DEPI+SMAT 


0.785 


0.678 


2 


(V) SICC+SMAT 


0.680 


0.573 


0 


DEPI+SICC+SMAT 


0.785 


0.571 


2 



[0219] This is even more obvious when examining the estimates of fault density according to the model (cf table 
16) and contrasting it to the "reaT values and the results listed in table 14. Evidently, the performance of the models 
for the logarithm of fault density Is much poorer than the perfonrrance of the models explaining fault density 



Table 16: 





'Real'a 


Tid esUiTK. 


ifiecf fauH densities for setected models 


A 


B 


E 


Leo 


Skartia 


VPro 


VPro 


WinDeich 


PPwin 


Slwin 


"Rear FDEN 


170.701 


39.015 


5.098 


44.375 


47.120 


12.697 


22.152 


106.835 


184.242 


Est (IV) 


152.617 


NA 


NA 


27.737 


27.678 


27.678 


27.678 


149.481 


147.281 


Est(V) 


74.900 


83.024 


3.708 


39.432 


40.360 


40.087 


37.084 


84.936 


84.832 



[0220] To conclude, the best models detected based on the data sets available are all linear models of fault density. 
If both the development mntime perfomtance index and the percentage of requirements changed after the specffcatlon 
phase are available, then the following predictive equation can be recommended: 



FDENi = -57.522 + 69.421 • DRPIi + 266.660 - PRCHi 



[0221] The second best choice to be used If the development performance Index is not known (while the selective 
maturity rating is) takes the fonn 



FDENi = 157.51 + 177.08 • PRCH< - 68.75 • SMAT<. 
[0222] For the fi> project the parameter ub of the partial redundancy model can then be estirnates by 
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tlo^ = SICQ • FDEN<. 



[0223] It is well understood that these findings are based on a small set of observations. In order to obtain better 
5 estimates data on more projects will be necessary. Furthemiore. due to the shortage on data points, it was not possible 
to derive parameter estimates from a subset of observations and to test the predicth^e perfomiance of the resulting 
model using the conrtplementary set. This type of analysis Is reserved for the future. 

3.3.2 Influence on the redundancy level 

10 

3.3.2. 1 Univariate analysis 

[0224] In the partial redundancy models the redundancy level r stands for the fraction of executed code constructs 
that are replaced after each test case. Therefore, it is bounded between zero and one. When taldng a look at the 
f5 estimates obtained for the data sets available (cf. table 3), two things become obvious: 

1 . Both the theoretical minimum and maximum values are actually attained. 

2. As for the minimum value, it even arises for more than half of the data sets avaflable (and also more than half 
20 of the data sets selected for analysis at the beginning of section 3.3). 

[0225] According to the first remartc, a model for the redundancy level will have to allocate a non-zero probability to 
the theoretical bounds. Moreover, the model used must be able to explain the excess zeros observed. Classical model 
^ families can hardly account for that. It therefore seems reasonable to look for a mixed model that concurrently models 

1 . the probability for the redundancy level to be equal, to zero and 

2. the value the redundancy level takes if it is not equal to zero. 

30 While the two questions are handled separately within this subsection, they will be integrated in one model In the 
foikiwing one. 

[0226] Concentrating on the first question, for the project the piece of infomiatlon of interest is the fact whether 
the redundancy level estimated for model is equal to zero or not This can be expressed by the indicator variable = 
'{0}W» which takes the value one if 0 and the value zero othenvise. A common approach to model the influence of 
35 somefactorxonthlsbinaryvariablelstoassumethat V/depends on an unobservable variable in the following way: 




40 



[0227] The expected value of V}* Is in turn thought to be llneariy related to the value of the factor xf 



^(V'/*l'/) = ceo + aiX, 



45 




50 



P^r,= 0\X|)=P(Yg=^\x^=^'P^Y;^0\Xf) = 



1 +exp|-ao-aiX^ 



1 



and 
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exp\-a^-a^x\ 



1 +exp|-o^,-aiX^ 
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[0229] Based on the set of independent observations {Xg, for n different projects, the coefficients oq and a, can 
be estimated by nnaximizing the log-iilceiihood function 



10 



20 



25 



35 



40 



ln£(ao,ai;xi,...,3;„,yi,...,y„) = 

= In fnp(tt = 1 1 x,ri>(». = 0 1 x,rv] ^ l^t^iO^^ ^ 

.»=i J L 117^1(1 + exp(-ao--aixO) J 

n n 

= - 5^(1 - Wi)(ao + OiXi) - 5Z ^ exp(-ao - axXi)] . 

«=1 <sl 



[02301 Due to the smali variabiiity of the y/s (that are either zero or one), calculating the coe^lclent of determination 
based on the en^or sum of squares with respect to this observable variable, SSE = Z " (// - F(y) = 1 1x^)2, does not 
IS seem to be reasonable. Several measures to be used Instead have been proposed.^ 

[0231] Since the criterfum for deriving the parameter estimates Is the maximization of the log-likelihood, McFadden 
suggested to use a measure based on the comparison of the log-iilceiihood of the full model (46) - calculated at the 
optimizing parameter estimates «(, and - to the maximum value the log-likelihood of a model in which the factor x 
Is assumed to have no influence. This restricted log-likelihood (denoted by In^o) the form 



InCoiaouyXf -^Vn) = ~ 5^(1 - y»)do - ^ In [1 + exp(-do)] . 



McFadden's measure 



30 *^MF — ^ ^ 

Is bounded by zero below, but the theoretically possible maximum value of one can generally not be reached. 
[0232] Aldrk^h and Nelson [1] propose a difTerent measure. 



^ 2(ln£(ao,a0-ln/:o(^o)) 
^ 2(ln£(do, &i) - ln£o(5o)) -h n 



whk:h builds on the likelihood ratio statists 



45 



50 
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[0233] McKelve/s and Zavoina's measure stkte closely to the analogy of the classteal coeff fclent of detemiination. 
However, not the variation of /, but the variation of the unobservable variable V* is used. This is done by estimating 
the values of via the assumed linear relationship with the factor x. 

A* A A 

[0234] Addittonally approximating the error sum of squares by n, the total sum of squares is given by n plus the 
variation of the estimated values y^. 



W the follawing ciscussion cf. Klein [269. 
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n 

which leads to the pseudo-fl^ measure 



^ - n 



If the evaluation of quality of fit Is to be based on the explanation of the variation of the latent variable V by the 
exogenous factor x, then it has been shown that fr^ is quite successful in nMroducIng the real but unknown coefficient 
of determination based on the regression of r on xwhile both and FT tend to underestimate the quality of fit. 
[0235] However, none of these measures Is unhrersally accepted There^, all three of them are calculated in the 
following analysis of environmental factors explaining whether a redundancy level of zero is attained. 
[0236] As for the investigation of the associations between the (estimated) redundancy level resulting in case that it 
Is not equal to zero and vartous environmental factors, the conflation coefficient p is calculated based on the data of 
those for projects for which rtumed out to be larger than zero. 

[0237] The first circumstance that may be assumed to have an influence on the two questions discussed Is the 
maturity of testing. In the questionnaire, five questions specifically related to system tect processes have been grouped 
together. For each project, the median value of the scores given to these questions has been calculated. Let this, rating 
be refen-ed to as testing maturity TMAT In table 1 7 the results concerning the infiuence of the testing maturity on the 
estimated redundancy level are collected. 





{/{0>(r)} 
Rmf I Ran I Rmz I Direction 


{f 1 r > 0} 
- P 


TMAT 


0.456 1 0.385 0.893 | + 


-0.656 



Tftble 17: Influence of testing maturity an the estimated redundancy level 

□pending on the measure used, the goocfriess of fit of the logit model seems to be moderate or strong. According to 
^MZ' ®^P®^'^'^ explanatory power of TMAT with regard to the (assumed) variation in the latent variable is high. 
Remember that the other measures are Inclined to underrate this property. The direction of the Impart - also shown In 
table 1 7 - is reasonable as well: The larger the testing maturity rating, the higher is the probability of attaining a redun- 
dancy level of zero. All data sete refer to companies employing systematic testing, whose goals Include the avoidance 
of repeated code executions. 

[0238] The sign of the correlation coefficient between the testing maturity and the redundancy level Is also as ex- 
pected: Higher testing maturi^ tends to decrease the intensity of redundancy in testing. The absolute value 0.656 will 
be discussed below. 

[0239] In addition to testing maturity, information on environmental factors related to the specific testing process 
were assessed with the questionnaire. Many of them are counterparts of factors examined for the software development 
process. All quantttativefoctorB are displayed in table 18 together with the number of projects for which no data were 
available. 



Tabloid: 



Eninronmental factors related to the test process 




Identifier Explanation # of NA 


TEFF 


Actual effort of the system test cyde in person months 


0 


TEPI 


Testing effort perfomiance index: Ratio of actual testing effort to planned testing effort 


1 


TRUN 


Runtime of system test cycle (in months) ; 


0 


TRPI 


Test runtime perfomiance index: Ratio of actual testing runtime to planned testing runtime 


0 
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Table 18: (continued) 



Environmental factors related to Uie test process 




Identifier Explanation # of NA 


TDIF 


Testing difficulty: TDIF = 
IBUN 


0 


TMSK 


Test team managers' skill: (Average) Number of years of experience in managing test projects 


2 


TGSK 


Testers' general skill: Average number of years of professional testing experience 


0 


TSSK 


Testers' specific skill: Averaae number of similar oroiects (w rt tvoe of pnniir^atinn toetAH at/* \ 
that the programmers have worked on before 


0 


PTED 


Percentage of testers with a special education as test engineers 


0 


TTSI 


Size of the test team: Maximum number of test team members 


0 


PTPR 


Percentage of testers that have participated at the development of the software under test 


0 


PNTT 


Percentage of new members in the test team 


0 


PALTT 


Percentage of test cases executed by a test automation tool 


0 


PREJ 


Percentage of failure messages generated during the test execution phase that were rejected 
due to enrorB In the test specification 


0 



[0240] For the sake of completeness, the palrwise correlations between all these environmental factors have been 
calculated; they are listed in table 1 g. 

25 [0241 1 With regard to the Influence on the estimated redundancy level, the same analyses as for the testing maturity 
rating have been conducted. According to the results shown In table 20, most factors merely have a small impact on 
the probability for achieving a redundancy level of zero. The four factors producing the best fit are the testers' specific 
and general skill level, the percentage of new members in the test team and the percentage of testers with an education 
as test engineers. Unfortunately, only the direction of impact attributed to the last one of these factors seems to be 

30 reasonable: The higher the fraction of specifically educated testers, the more efffcient testing can be. On the contrary, 
for example the outconDGssuggestingthat higher general orspedficskllis on the part of the testers reduce the probability 
of avoiding redundancy in testing are counterintuitive. 
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Table 20: 



10 



IS 
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Influence of the environmental factors related to testing on the estimated redun-dancy level 





MF 


r^ 
n 

AN 




Direction 


{nr> 0} 

K 


TEFF 


0.163 


0.183 


0.180 


+ 


0.719 


TEPI 


0.198 


0.215 


0.169 




0.656 


TRUN 


0.096 


0.117 


0.101 


+ 


0.696 


TRPI 


0.201 


0.216 


0.140 




0.656 


TDIF 


0.020 


0.027 


0.012 




-0.864 


TMSK 


0.195 


0.211 


0.194 




0.656 


TGSK 


0.691 


0.487 


0.994 






TSSK 


1.000 


0.579 


0.999 




0.656 


PTED 


0.382 


0.344 


0.771 


+ 


NA 


TTSI ^ 


0.159 


0.179 


0.164 


+ 


-0.656 


PTPR 1 


0.323 


0.307 


0.678 




-0.656 


PNTT 


0.3^ 


0.344 


0.946 




NA 


PAUT 


0.162 


0.182 


0.119 


+■ 


0.123 


PREJ 


0.096 


0.117 


0.102 


+ 


-0.706 



[0242] As for the congelations of the environmental factors with r it is even doubtful for most of the signs whether they 
actually make sense. For example, both testing difficuity and the percentage of failure messages that are rejected due 
to faults in the test specification seem to lead to a h^her efficiency In testing. Another point to notice is that correlations 
with an absolute value of 0.656 show up very often. In fad, since the correlations are calculated from the data of those 
projects for which 0, they are based on four observations only. Two projects each were earned out by the same 
company. As it tums out, congelations of 0.656 or -0.656 indicate that the respecth^e factor is constant within one com- 
pany but different across the two companies. Since most factors are able to discrfrnlnate between the companies but 
not between the projects, such correlations appear frequently. 

[0243] Obviously, data on more projects will be necessary to come to better conclusions - especialiy with regard to 
the influence of environmental factors on the redundancy level if this level is not equal to zero. For the time being, it 
seems reasonable (and the best one can do) to assume that the probability for r= 0 is Influenced by testing maturity 
and the percentage of testers with a specTic education as test engineers while the value of rfor r > 0 is influenced by 
testing maturity alone. 

[0244] In the following section, a model integrating these two propositions will be applied 



3.3.2,2 Multivariate analysis 



^ [0245] Combining the separate analysis of the cases in which r is equal to zero and the other cases in which it is 
between zero and one leads to a mixture model consisting of one point distribution, which is realized with probability 
K*, and some other distrtoution, which comes into play with probability 1 - n*. The question is which "other- distribution 
should be used. While one may have the impression that ris a continuous random variable it is not. As assumption 4 
of the partial redundancy model shows, the redundancy level has to be some nftultiple of the reciprocal value of p, the 

so number of constmcts chosen per test case. The rounding procedure described by the equation mariced with (39) takes 
care that this relattonship is a bo true for the estimated quantities rand p. Therefore, r . p is definitely an integer value, 
bounded between zero and p. Let this integer value estimated for project / be denoted by Og. Due to the upper bound 
on Of, a distribution pennitting to take any integer value - like the Poisson distribution - is not appropriate. Thus, the 
zero inflated Poisson (ZIP) model introduced by Lambert [30] does not fit for our purposes. Recently, Hall [20] adapted 

55 the model for bounded integer values by substituting the binomial for the Poisson distribution. 

[0246] The setup of the zero inflated binomial (ZIB) model Is as follows: With a probability of n the random variable 
O; is from the zero state, and with the inverse probability (1 - n ) it is from a binomial distribution with sample size p, 
and success probability Consequently, the probability distribution of 0,is given by 
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- I (1 _ - ^,)«-o. for 0, = 1 

Incorporating the influence of environmental factors, the two protMbllities jt, and n* are modeled lilce in a logit model 
Since we specifically assume that n,ls affected t>yTMAT,alone whiten is affected^ both TMAT, and PTED.this leads 
to the propositions 



it; = 



and 



l + e3q>(-7o -71 TMATi -Ti • PTEDi) ~ 
_ exp(Tb + 7iTMAT<-f Ti>-PTEDO exp(g.-7) 
1 + exp(7b + 71 TMATi+Ti, . PTEDi) ~ l + exp(g<7) 



, 1 explPo-fp^.TMATJ exD(g.v) 

' l+expl-Po-Pi-TMATd 1 + «(p|Po + -TMAT^ " 1 + exp(flr,T) 



where 



g, = (l TMAT, PTEDj) y=(Tb 71 -ft) 

b, = (l TMAT,) /3'=(A, 0x) 

With the indicator variable being equal to one if O, is zero (and Y, being zero otherwise), the iog-Klcelihood of the 
ZIB model can then be fonnulated as follows: 

n 

In £(/3, 7; 01, ...On) = 5^ In /(0< = o,) = 
•=1 

= fl{yi^ [< + (1 - oa - ^^Di + (1 - yi) iB [(1 - 0 (^)^(i - } = 



+(1 - 1H)^^hi/3 -ft In(l + exp(hi0)) + In | (46) 

[02471 The main problem with this tog-tikelihood is that its maximization with respect to p and its maximization with 
respect to y cannot be treated separately. This would not be the case if we knew for each O, whether it is generated 
by the zero state or by the binomial state of the mixed model. Let the (unobservable) indicator variable Zg be one If the 
fomner situation has arisen and zero othenvise. The joint probability distribution of O/and 2J is 

f(n. — OL. 7- — ^ ^ — / for Oi = 0 and Zi — l 
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[0248] If all z, z„ were known, then the log-likelihood could be written as 



n 

hiC(j3,T,Qi,...On,zi Zn) = '^laf(Oi = Oi,Zi = Zi) = 

= E{^»"(<) + (l-^)«°[(l-<)(^!)'i'(l-'r<r-"']} = 

= XT i'^en - In [1 + exp(gnr))} + - ^) |o,b,i9 - ft In (1 + exp(bii9)J + hi P* j } 
= ln/:('y;ari,...,Ji.) + ln£09;oi,...OH,a:i,...,z„). (47) 

[0249] Since the z, z„ are missing, the direct independent estimatkm of p and y is not possible. However, the 

EM algorithm [3] can be employed here. It alternates belween estbnating the unobserved Information z,, .... z given 

the current estimates of fi and y (in the so-called E step) and maximizing lnC(7; z, zj and In C(P; o, , ...o„Tz„ .... 

z^ conditfonal this estimate of z,, z„in the M step. At the (n-1 )* iteration, the following computattons are perforrned: 

I.Estep: 

[0250] If > 0, then it cannot be from the zero sHrte, and zj^ = 0. For 0,= 0, 

2i« = f?(2i|oi = 0,y-),/9W) 

= 1 • = 1 1 = O.y-),^')) + 0 . PiZi = l\o, = 0.-yW /St-)) 
= P(2i = l|o» = 0,7<'->,/9('->) 

fi<H = 0,24 = l;¥'>,/3'''*) + /(o, = 0,zi = 0;7W./3W) 

* (48) 



l + e3q>(-8s7<'))[l + exp(lv/9<'>)] 



2. Mstepfory: 
[OKI] Since 



= -O'""^]. (49) 

maximization of In C(r . Implies the estimation of a logit model for the dependent variables ^'^ ^'^ with 

the exogenous variables contained In g, g^ However, note that the /'^ 2<'>are not indicator variables! but real- 
valued variables bounded by zero and one. In a more general view, an unweighfed binomial logistic regression with a 
binomial denominator of one for each observation is carried out. 

3. Mstepforp: 

PI252] Analogously. It can be shown that the maximization of the log-likelihood referring to p Is related to a weighted 
binomial logistic regression with weights (1-2^^) and binomial denominators of pf 
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ln/:09;oa,...o„,zW,...,4'>) 
= i;(l-4'^)ln[^,-(l-^,)«-(^;)] (50) 



10 

As for the choice of starting values 7(1) and pO), please refer to Hall [20]. 

[0253] Goodness-of-fit measures forthe zero inflated binomial model are not discussed in Hall's original paper. How- 

" " ■ ■ ■ " ^ij 



ever, it is straightforward to apply McFadden's and Aldrich's and Nelson's The restricted modeho be com- 



— . . - — - AU iwwMw^ww ■iiwvittfi w WW 

pared with the full one contains two probabilities if * and ^ which do not dependon any project characteristics By 
IS substituting 

S< = 1> 7 = 70, bi = l and 0==0o 



20 



forthe vectors g^, y, b/and p In equation (46), the corresponding log-likelihood Is obtained: 
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ln£o(A,7o;oi,...ov») = X^{!&ln[exp(7o) + (l + exp(ft))^'^] -ln[l + ^^ 

+(1 -yi)^A-Pi ln(l + exp(A))) + In (^*)] | (51) 

[02541 SIncethe equations (47) to (50) can beadapted In the same way, It is possible to estimate the two parameters^ 
andTt via the EM algorithm described above. 

[02551 According to the parameter estimates obtained for the project data available, the probability for attaining the 
zero state Is best estimated by 

A*^ 1 

' 1 + exp|19.288 - 9.644 - TMAT, - 34.752 • PTEDj 

40 white the estimation of the replacement probability in the binomial case should be calculated as follows: 

A-^ 1 

' 1 + exp|-3.1 70 + 1 .379 . TMATJ 

45 

The values attained for f? and are 0592 and 0.982, respectively. While the value of McFadden's measure 
seenns to be rather small (anthough one has to bear In mind that the maximum of one can never be reached In practical 
application), Aldrich's and Nelson's measure indicates an especially good fit. 

[0256] Additional, more detailed results of the model estimation are shown in table 21 . Besides the estimated redun- 
so dancy levels r, the estimated probabilities for attaining the zero state and the estimated expected values of ?/ 

E(r^=a;o+(i -ft; )«,=(!. ft; 

55 are listed. 
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Table 21: 



10 





Res 


'c/ZTs of fitting the zero infiated binomial model 




A 


B 


E 


Leo 


Skartia 


VPro 


WinDeich 


PPwIn 


Slwin 


A 

n 


0.000 


0.000 


0.000 


0.750 


0.339 


0.000 


0.000 


0.709 


1.000 


A* 


1.000 


1.000 


1.000 


0.500 


0.500 


0.500 


0.500 


0.000 


0.000 




0.000 


0.000 


0.000 


0.301 


0.301 


0.301 


0.301 


0.857 


0.857 



[0257] These results look very promising, too. However, one issue becomes obvious: Separation between different 
projects of one company has not been achieved. This is not a characteristfc of the zero Inflated binomial model, but of 
the data available. As already noted in section 3.3.2.1, the environmental factors related to testing hardly vary within 
one company while the estimated redundancy levels do to sonte extent. A possible remedy may be the collection of 
software-related pieces of infomnation like its testabflity". Whether such concepts can be operatlonalized and measured 
remains to be seen. 



4 Conclusions 



[0258] In this study, a software failure model for data collected during systematfc testing has been derived and dis- 
cussed in detail. Several ones of the model parameters have physical interpretations and can be related to environ- 
mental factors of the software development and test process. While the development runtime performance index and 
the percentage of requirements changed after the specifk:atlon phase have been found to be especially capable of 
estimating the fault density of the applteatlon at the beginning of the system test phase, prediction of the redundancy 
level during testing seenns to be possible based on the testing maturity rating obtained from the PETS questionnaire 
and the percentage of testers with an education as test engineers. As expected, the capability score of most develop- 
ment processes is negatively conrelated with the fault density of the software produced. To an even higher degree this 
is tme for the selective maturity rating based on the scores of the (component) processes ENG.1 .3, ENG.1 .4, SUP.3, 
SUP.4, SUP.8. ORG.3 and 0RG.4. However, the project specific factors mentioned above seem to be more influential 
on the actual outcomes. 

[0259] Nevertheless, the questionnaire developed in the PETS project may be useful even for additional objectives 
than collecting data for software failure models. Although the questionnaire cannot (and Is not Intended to) replace 
SPICE assessments, the ratings obtained from it seem to be in accordance with standard assessment results. ' 
[0260] It Is well understood that the findings are based on a snnall number of observations and that further test projects 
are needed in order to obtain better estimates and to evaluate the predictive quality of the models derived here. All the 
same, the cun^ent achievements of the PETS project are very promising. 

A Diagrams of fitted data sets 

[0261] This appendix contains the diagrams refenred to in section 2.5.3. The projects used have been described in 
the Justified IVIodel Selection [18] and in the Baseline Experiment Reports [2, 5, 19, 21]. As for the projects carried out 
at imbus AG, the actual project names were replaced by single letters In order to protect confidential information con- 
cerning third parties. 

[0262] Figures 12 to 26 show the (first) extended partial redundancy model fitted according to all four estimation 
techniques explained in sections 2.5.1 and 2.5.2. The common legend to these diagrams is as follows: 



Table 22: 





Legend for nguies 12 to 26 


50 


--J 


Original data set 






Extended partial redundancy model fitted using LS-Cum 






Extended partial redundancy model fitted using LS-Delta 






Extended partial redundancy model fitted using ML-NHPP 


55 




Rrst extended partial redundancy model fitted using ML-Setup 



[0263] Figures 27 to 41 contain the same data sets once again. Now the first extended partial redundancy model 
was fitted using the revised maximum likelihood estimation based on the likelihood Implied by the model setup for all 



65 



EP1420 344 A2 

the projecte. as described In section 2.5.3. The legend for these figures is given by table 23. 

Table 23: 



Legend for figures 27 to 41 



Original data set 

First extended partial redundancy model fitted using ML-SetupC 
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Figure 12: Pro/eci i4 




Figure 13: Pwied B (rejected favUs included) 
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Figure 14: Project C, first data set 
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Figure 15: Project C, second data set 
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Figure 17: Project E, two integration test cycles 
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Figure 19: Project E, second system test cycle 
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Figure 22: Pmi/crt 5!feartia 
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Figure 23: Project VPro 
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Figure 27: Project A 
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Figure 31: Project D 
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Figure 35: Project F 
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Figure 38: Ptpject VPrv 
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Figure 39: Project WinDeich 
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S5 B PETS Questionnaire 

[0264J On the following pages the PETS questionnaire veislon 1 .0 used for assessing Infonnation on the baseline 
projects is shown. 
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[0265] Its structure is as follows: 

• General information 

- About the organization 

- About the assessor 

- About the capability of the software development process (Induding software unit test) 

- About the capability of the system test process 

• Specific Information 

- About the specific development project (excluding the system test phase) 

- About the specific software product 

- About the specific test project 

[02661 Further explanations, for example concerning the scenarios, are contained in section 3.1 of this document. 
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10 
15 


The project plans are tailored from 
a company-wide software 
engineering process, for which a 
process improvement strategy 
exists. 


Due to the analyses of the deflned 
set of measures taken for all 
projecta the influence of project 
characteristics on the resources 
needed is well understood. Based 
on this knowledge realistic 
milestonei are estimated. 


Cstmiates of the needed resources 
for all project phases and tasks, are 
based on metrics collected in 
similar projectt. If the available 
resources are not sufficient, they 
are allocated to the projectt 
according to their priorities. 
Moreover, if limiting resourees are 
detected, appropriate measures are 
taken for increasing dieir capacity. 


A defined set of metrics U used to 
check e.g. employee workload or 
task fiUfiUing. Regularly, reality is 
checked against the project plans. 
Differences are corrected by re- 


20 

i 

25 


A detailed project plan with all phases, 
tasks to be done and milestones to be 
reached exists. It is tailored fiom a 
standard software engineering proceu. 


All needed resources for conductfaig 
projects are planned, and milestones are 

estimates, quantitative hformation from 
other projectt or previous project cycles 
is often not available. The project goals 
and deliverebles are adjusted u> the 
resources or vice versa. 


All project pnases and bisks are checked 
for the needed resources; however, 
there often U a lack of precision. These 
resource plans are checked against tiie 
available resources. If tiiere are less 
available resources the project plans are 
adapted to the available resources. AU 
project activities are prioritized to be 
able to share urgently needed but less 
available resources wiUi other projectt 
and/or tasks. 


Regulariy, reality U checked against the 
project plans. Sometimes, metrics are 
used, but tiiey are not dq^ned, and their 
collection is not integrated into the 
process. If problems are noticed, then 
tiie project is re-scheduled or re. 
planned. 


30 

as 


A project plan with tbe most 
important milestones and high level 
tasks exists. 


The most unportant resources to 
conduct the projectt are identified 
and planned. Use of resources is 
estimatod and a4justed with the mab 
project goals. For deriving the 
estimates, quantitative information 
from other projectt or previous 
project cycles is often not available. 


• ac ai{(D*iwvi projeci pians aenne 
some needed resources. These 
resources are checked against 
available resources. But there is 
often a lack of precision. If the 
needed resources are not available 
often tiie plans are not revised. 


When a lack of resources occurs tiie 
project plans are adapted to tiie 
available resources if possible. Often 
tiiere is only a new resource 
disposition and no adapted resource 
and/or project plan. 


o 

1 

45 


No exact project plan 
exists. Tbe project tasks 
are •'brein saved**. 


NO detailed resource 
planning Is done. 
Normally only the end 
date is fixed by die . 
customer. 


planning of resources. If 
there are plans and 
estimations tiiey were 
done based on AUIy 
available resources. 


Normauy, tne lacK or 
resources are noticed 
sbortiy before end of 
project Available 
resources are shared 
competitively not 
cooperatively. 


50 


Do you have a project 
plan? 


UO you Identity tbe 
resources needed (like 
specific HW or tools) 
ahead of time? How do 
you calculate the 
resulting expenses? Do 
you use estimates? 


«W Ul« lb9UIUWC9 

planned actually 
available? Is the project 
plan feasible? 


uo you cnecK ine current 
status of the project 
against the plan? 
Are early corrective 
actions taken? 
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Does 
not 
annlv 


n 




□ 


□ 


□ 


Score 
(0-5) 














Development projectt tre tailored from the 
company*wide development process. 
Feedback during and after project work is 
welcome t^ assist the definition of better 
processes, and it ii used to continuously 
improve the process. The success of an 
improvement initiative can easily be 
cvahiated based on measured p^ormance 
metrics. 




n 


Monitoring the identified and prioritized 
risks is nq)ported by a company-wide 
measurement program. If new risks are 
identified during one project they are added 
to the list of potential risks. Experience 
acquired with a risk is also used for 
generally ({escribiag an avoidance strategy. 

Monitoring risks is supported by a company- 


wide measurement prograno. Therefore 
problems are recognized early enough to 
counteract according to defined 
compensation strategies. 


CO 

i 

M 


A company-wide development 
process was established some time 
ago. It is 0ilored for each individual 
development project. Every project 
has iu own quality assursnce rules 

development process. However, dtere 
is no defined procedure for changing 
the standard process in accordance 


1 

5 

s 

o 
< 


".' 5 
5, - A - 

II 

w 


isks that may arise due to the 
Ic characteristics of a project are 
fied in the project strategy. This 
risks is regularty updated 
I the project Resources for 
oring these risks are allocated 
ling to their priority (based on 
)otential impact and occurrence 
bility). 

g the whole software 


development life cycle jH^Ject 
parameters and resources are 
monitored. For the most conunon 
risks a strategy to avoid of . 
compensate them exists. Spme ■ 
metrics to measure project progress . 
and project success have been 
established 




5 




^lllflilil 


Score 1 


A general understanding of the basic 
project phases and astlviUes exists. With 
this respect, project managen can make 
use of previous experience and therefore 
achieve a higher quality project plan. 
However, there U no detaUed standard 
process for the whole organiatlon^ and 
the isolated high*level concepts, that exist 
for individual sub-processes, may 


i 
\ 
i 




Due to their previous experience with 
other projects, project managers are 
aware of potential risks. If their time 
allows, they try to keep track of the risks 
that they consider crucial. 

After reaching a milestone the project 


plan is adjusted to the new situation. For 
some recognized risks compensation 
strategies have been defined 


Score 0 


No defined and 
established 
development process 
exists. Each project 
manager follows his 
own concepts when 
planning a project 




'■ 

1 


Abstract rtsks are not 
discussed Actions are 
only taken when a 
problem has actually 
arisen. 

Problems (e.g. budget 


ovenunsj are oormauy 
noticed shortly before a 
prK^ect ends. 




Do projects 
follows 
standard 
development 
process which is 
improved in 
accordance to 
experience? 




#■ 

k 


Do you identity 
potential risks of 
a specific 
project? How do 
you keep track 
of them? 

When and how 


do you normally 
become aware 
of problems in a 
project? 
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Does 
not 
apply 




□ 


□ 


□ 


□ 


□ 


□ 


5 


Score 
(0-5) 
















10 
15 
20 


Scores 




m 

1:1 


[o addition to being infonned by the QA managers, 
tlie management can use • tool (e.g. a MIS) for 
automaUcally analyzing the QA results. e.g. with 
respect to trends, or patterns. 


QA acUviUes (document review, code review. QA 
trahiing) and tools (e.g. templates) are themselves 
subject to reviews and continuous improvement. The 
same holds tme for the description of the standarxl 
foftwwe development process. 


Reviews and training are conducted, and 
development standards and instnictiona are derived. 
Besides, QA activities also contribute to the 
continuous improvement of programming activities 
and techniques as well as the standanl software 
engineering process. 


Besides conducting reviews and training and 
deriving standanis and instnicUons, they also 
contribute in the continuoua improvement of QA 
activities as well as the standard software 
engmeering process. 


Through their organization the objectiveness of the 
QA employees is guaranteed, [f it turns out diat a 
certain smicture does not meet this goal, then the 
stnicture Is changed. 


25 
30 


Score 3 




:' "It? 


Information on the review process 
(reviews conducted, results, etc) is 
collected by the QA iqanager who then 
reports to upper management 


QA activities range firom document and 
code reviews to QA training for the project 
members. Templates for documents are 
available. Process instructions are 
available. I 


Peer reviews among programmers in one 
project, as welt as between programmen 
in different projects are planned and 
conducted. Programmen are mandatory 

test specifications. 


QA employees conduct reviews, regular 
training, prepare QA process instructioos 
and develop the document templates. 
Additionally, they derive standanis for 
coding and documentation. 


Specific QA employees are organized in 
the QA department They are not within 
the responsibility of project managers. 
Thus, a cenaln degree of objectiveness is 
guaranteed. They cooperate with projecU 
and report to upper management 


35 
40 
45 


Score I 




t'{ 

T.-..-:. 


10 case or reviews toe 
protocols are filed along 
with the other pn^ect 
documents. Management is 
not actively infonned. 


TheQAacUvides 
concentrate on the actual 
**doing", e.g. reviewi. Ko 
training activities are 
considered. 


Programmen occuionally 
conduct peer code reviews. 
Eveiy now and then they 
are invoNed in the review 
process of specification 
documents. 


The QA manager 
organizes and conducts 
code and document 

For new employees the QA 
manager gives an 
introduction to QA. 


liif i 
filll 

Jllli 


Score 0 




i 

;:f 


6 
z 


QAacttviuesarenot 
explicitly considered in 
the development 
process. 


i 


None. 


Not at all. 


SO 






i 
0 


Are xne results or 
reviews passed on 
to management? 


Which activities in 
your development 
process assure and 
support quality? 


What types of QA 
activities do the 
programmers 
conduct? 


What types of Q A 
activities do 
specific 

employees, e.g. 
QA manager, 
conduct? 


How are these 

employees 

organized? 
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• What was the scheduled effort of software development excluding system test (in person months)? 


• What was the overall effort actually spent on software development excluding system test (in person months)? 


How was this overall effort distributed over the following phases of software development? 


• Effort spent on requirements analysis (in person months) 


• Effort spent on software design (in person months) 


• Effort spent on development of mediods and algorithms (in person months) 


• Effort spent on programming (in peison months) 


• What was the scheduled runtime of software development excluding system test (in months)? 


• What was the actual runtime of software development excluding system test (in months)? 


How was this actual runtime distributed over the following phases of software development? 


• Runtime of the requirements analysis phase (in months) 


• Runtime ofthe software design phase (in months) 


• Runtime of the development of methods and algorithms phase (in months) 


• Runtime ofthe programming phase (in months) 


1 • What is the average managing experience of the project manager(s) (in years of experience in managing software development projects)? 


{• What is the maximum size of the development team (in persons)? 


• What is the general average skill level of the programming team (in years)? 

(Add up the years of professional programming experience for all the programmers in the development team and dien divide the result by 
the number of progranuners.) 
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5 


i 




























de executions 






10 
















testers.) 


g 


e 












Bdundant co 






15 
















f the number of 


1 

1 
1 


o 

1 








•1 

o 


re executed? 


Avoiding i 




specification? 


20 
25 
30 
35 
40 
45 
SO 


r. 

11 

1 i 




What i$ the planned effort in this test cycle (in person months)? 


What is the actual effort in this test cycle (in person months)? 


What is the scheduled rtmtime of the test cycle (in months)? 


fTiioi IB uw abiuai niAuuic ui uic win i^/etc \ui muu\a9)( 

What is the managing experience of the test team manager (in years of experience in managing test projects)? 


What is the maximum size of the test team (in persons)? 


What is the general average skill levels of the test team (in years)? 

(Add up the years of professional testing experience for all the testers in the test team and then divide the result b] 


What is the average number of projects of this kind (type of application to be tested, etc.) that the test team memb 
before? 

(Add up the number of previous similar projects for all the testers in the test team and then divide the result by the 


How many of the testers in this project have a special education as test engineers? 


How many of the tester in this project have particqsated at the development of this piece of sofhvare? 


How many of the testers working on this project are new in the test team? 


How many software faults were detected a/it/ corrected during the test preparation phase (developmrat of test spei 


What was the main strategy followed in this project when specifying the test cases and the order in which they we 


MImlcIdng the operational profile Modularity of the test cases Ease of recovery 


What pereentage of the test cases are executed by a test automation tool? 


How many of the failure messages generated during the test execution phase were rejected due to errors in t£t test 
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Claims 

1 . Method for failure predbtton of software programs, in particular of the number of failures occurring during tesV; 
the number of faults remaining in a software program after test, based on test data, failure data anchor sof: ;are 
maturity data and proje<^Tspeciffc data using a statistical model comprising the steps of: 

collecting the test data and the number of failures occum'ng during testing and/or collecting the softwar . .ma- 
turity and project-speclfk: data from developers or testers of the software, 
- processing the coPected data by a data analysis, a parameter estimation, a data compression and/or l, . ^re- 
diction to obtain failuie prediction data, and 
outputting said failure predk:tion data. 

2. Method as claimed in daim 1, 

wherein the software maturity data are collected by use of a questionnaire. 

3. Method as claimed in any of the preceding dainrts, 

wherein the test data reflect the progress of test activities, whk:h ana measured for a predetemnined time dure ion 
during a test and^or for a predetemnined percentage by wh toh the functionality of the software Is covered by the test. 

4. Method as claimed in any of the preceding dairhs, 

wherein the failure data are put in relation to test progress data and reflect the number of failures of the software. 

5. Method as claimed in claims 3 and 4, 

wherein the failure data reflect the number of failures of the software in the predetemnined time duration and/or at 
a predetermined percentage of the functionality covered by the test. 

6. Method as claimed in any of preceding claims, 

wherein the software nnaturily data are collected for single subprocesses of the software development, in particular 
requirements management, configuration managennent, design, failure management and/or test. 

7. Method as claimed in claim 6, 

wherein the description of subprocesses is predetermined by a process maturity model, in particular SPICE. 

8. Method as claimed in daim 6 or 7, 

wherein the quality of single subprocesses is measured using a scale, in particular ranging from zero for a bad 
subprocess to five for aii optimum subprocess. 
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9. Method as claimed in claim 8, 

wherein the quality of subprocesses is measured using a questionnaire describing the subprocesses by use of 
examples, in particular by use of scenarios, which can be compared to the process under consideration and from 
which the appropriate subprocess can be selected or be marked as not relevant for the process under consider- 
5 ation. 

10. Method as claimed in any of the preceding claims, 

wherein a statistical prediction model is used in combination with statistical estimation methods such that it can 
be adapted for different fields of use. 

10 

11 . Method as claimed In any of the preceding claims, 

wherein the software maturity data are reduced to a number of subprocesses which have the higher Influence on 
the failure progress. 

'5 12. Method as claimed in any of the preceding claims, 

wherein univariate and/6r multivariate and/or correlation analysis is used for selecting the software maturity data 
that have the highest Influence on the failure oocunrences. 

13. Method as claimed In daim 12, 

^ further comprising the step of analysing the Influence of process maturity and project characteristics . 

14. Method as claimed in any of claims 11 to 13, 

wherein data compression Is perfomned within a single process first by forming a median from the responses to 
con^esponding questions of the questionnaire and wherein appropriate data are thereafter selected for use in the 
statistical model using a correlation analysis and/or regression analysis. 

15. Method as claimed in any of the preceding claims, 

^ wherein a zero inflated binomial model is used for modelling the redundancy level for predictions generated by 

the partial redundancy model. 

. 30 

16. Method as claimed in daim 15, 

wherein a multivariate analysis is used in said zero inflated binomial model. 

17. Method as claimed In daim 15 or 16, 

35 wherein the zero Inflated binomial model models the redundancy multiplied by the number of average constructs 

assuming a distribution mixed by a one point distribution and a binomial distribution. 

18. Method as claimed in any of the preceding daims, 

wherein a maximum likelihood method Is used for estimating a likelihood function for the partial redundancy model. 

40 

19. Method as daimed In dafrn 18, 

wherein the maximum likelihood method is used for estimating parameter values whteh are plausible in view of 
observed parameters, the maximum likelihood method being a measure for plausibility of a parameter constellation. 

45 20. Devtee for failure predk:tion of software progrEims, In partteular of the number of failures occuning during test and 
the number of faults remaining In a software program after test, comprising: 

means for collecting the test data and the number of failures occum'ng during testing and/or collecting the 
software maturity and project-specific data from developers or testers of the software, 
so - means for processing the collected data by a data anal^ls, a parameter estimation, a data compression and/ 
or a predk:tion to obtain failure prediction data, and 
- means for outputting said failure prediction data. 

21 . Computer program comprising program code means for implementing the steps of a method as daimed in any of 
ss the daims 1 to 1 9 when said method is run on a computer 

22. Data carrier storing a computer program as daimed In claim 21 . 
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Test progress data 
(black box coverage); 
collected during 
systematic testing ddier 
manually or 
automatically by a test 
manamneht tool 



Test progress data 



Test progress data 
(code coverage); 
automatically collected 



tool during systematic 
testing 



Manual or automated 
logging of failure 
occurrence data 




Results of an official 
SPICE assessment 



Ass^jBSsment of tbe 
software process 
^maturity level using 
the PETS 
qnesdonnaire 



Software process maturity data 




Project specific data, e.g. code size 



PETS prolotpye 

• First ext^Kled partial redundancy model 

• Second eactended partial redundancy model 



Dynamic/static estimates 

• Number of failures to be occurred after i executed test cases 

• Number of faults remaining in the software after i executed test cases 

• Number of faults remaining after exeuction x additional test cases 

• Number of additional test cases to be executed until x fi»ilt$ remam 



Rg-A 
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Software process maturity data 
using the PETS qoestionnaire 
6om SGvml industrial projects 



1 


Test progress data fiom 
several industrial projects 








Failure occurrence data from 
several industrial projects 








Project specific data from 
several industrial projects 




Univariate/Multivariate analysis and corrdatio];p^analy8ls 1 




^ Determination of tfie SPICE processes used in the PETS questionnaire which have 
the highest influence on the failure occurrences; 
^ Development of formulas for the static estimates 



Fig.B 
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